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Abstract 


This paper studies identifiability and convergence behaviors for parameters of multiple 
types in finite mixtures, and the effects of model fitting with extra mixing components. 
First, w e present a gene ral th eory for stron g identifiability, which extends from the previous 
work of Nguven 1 2013 1 and Chen 1 1995l] to address a broad range of mixture models and 
to handle matrix-variate parameters. These models are shown to share the same Wasser- 
stein distance based optimal rates of convergence for the space of mixing distributions — 
under Wi for the exact-fitted and n under W 2 for the over-fitted setting, where 
n is the sample size. This theory, however, is not applicable to several important model 
classes, including location-scale multivariate Gaussian mixtures, shape-scale Gamma mix¬ 
tures and location-scale-shape skew-normal mixtures. The second part of this work is de¬ 
voted to demonstrating that for these ’’weakly identifiable” classes, algebraic structures of 
the density family play a fundamental role in determining convergence rates of the model 
parameters, which display a very rich spectrum of behaviors. For instance, the optimal rate 
of parameter estimation in an over-fitted location-covariance Gaussian mixture is precisely 
determined by the order of a solvable system of polynomial equations — these rates deteri¬ 
orate rapidly as more extra components are added to the model. The established rates for a 
variety of settings are illustrated by a simulation study. 0 


1 Introduction 


Mixture models are popular niodeling tools for making inference about heterogeneous data BLindsav , 
I 995 L iMcLachlan and BasfordLll988n . Under the mixture modeling, data are viewed as samples from 
a collection of unobserved or latent subpopulations, each posits its own distribution and associated 
parameters. Learning about subpopulation-specific parameters is essential to understanding of the un¬ 
derlying heterogeneity. Theoretical issues related to parameter estimation in mix ture models, however, 
remain poorly understood — as noted in a recent textbook tPasGuptal. 1200811 (pg. 571), “mixture 
models are riddled with difficulties such as nonidentifiability”. 


Re search about parameter identifiability for mixture models goes back to the early work o fiTeicher 


J1961 


2003 


Hall et al.. 2005. Elmore et al. 


1 96311. lYakowitz and Snragins il968n and others, and con tinues to attract much interest iHall and ZhouL 


2005LlAllman et al.Ll2009tl . To address parameter estimation rates, 


a natural approach is to study the behavior of mixing distributions that arise in the mixture model. This 
approach is well-developed in the context of nonparametric deconvolution I Carroll and Hall . 1988i 
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Zhand.ll99(J. iFanL 1199111 . but these results are confined to only a specific type of model - the location 
mixtures. Beyond location mixtures there have been far fewer results. In particular, for finite mixture 
models, a notable contribution was made by Chen, who proposed a notion of strong identifiabili ty and 
estab lished the convergence of the mixing distribution for a class of over-fitted finite mixtures IChenL 
199511 . Over-fitted finite mixtures, as opposed to exact-fitted ones, are mixtures that allow extra mixing 
components in their model specification, when the actual number of mixing components is bounded 
by a known constant. Chen’s work, however, was restricted to models that have only a single scalar 
para meter. This re striction was effectively removed by Nguyen, who showed that Wasserstein distances 
(cf. iVillanil 12009111 provi de a natural source of metrics for deriving rates of convergence of mixing dis¬ 
tributions iNguyenl.l2013n . He established rates of convergence of mixing distributions for a number of 
finite and infinite mixture models with multi-dimensional parameters. Rousseau and Meng ersen stud¬ 
ied over-fitted mixtures in a Bayesian estimation setting IRousseau and Mengersenl. 1201111 . Although 
they did not focus on mixing distributions per se, they showed that the mixing probabilities associ¬ 
ated with extra mixing components vanish at a standard rate, subject to a strong identifiability 

condition on the density class. Finally, we mention a related literature in computer science, which 
focuses almost exclusively on the an alysis of computationally efficient procedures for clustering with 
exact-fitted Gaussian mixtures (e.g., iDasguptal.ll999LlBelkin and SinhaLl2010l.lKalai et al.Ll2012ll l. 

Due to requirements of strong identifiability, the existing theories described above are applicable to 
only certain classes of mixture models, typically those that carry a single parameter type. Finite mixture 
models with multiple varying parameters (location, scale, shape, covariance matrix) are considerably 
more complex and many do not satisfy such strong identifiability assumptions. They include location- 
scale mixtures of Gaussians, shape-scale mixtures of Gammas, location-scale-shape mixtures of of 
skew-normals (also known as skew-Gaussians). A theory for such models remains open. 


Setting The goal of this paper is to establish rates of convergence for parameters of multiple types, 
including matrix-variate parameters, that arise in a variety of hnite mixture models. Assume that each 
subpopulation is distributed by a density function (with respect to Lebesgue measure on an Euclidean 
space A) that belongs to a known density class |/(x|0, S),0 G 0 C S G D C ,x G -tI. 

Here, di > 1,^2 >0, is the set of all d 2 x d 2 symmetric positive definite matrices. A finite 
mixture density with k mixing components can be defined in terms of / and a discrete mixing measure 
G = Yli=iPi^(9i,'Si) k support points as follows 


pcix) = / f{x\e,T,)dG{e,T,) = '^pif{x\ei,j:i) 


2=1 

Examples for / studied in this paper include the location-covariance family (when di = d 2 > 1) un¬ 
der Gaussian or some elliptical families of distributions, the location-covariance-shape family (when 
di > d 2 ) under the generalized multivariate Gaussian, skew-Gaussian or the exponentially modified 
Student’s t-distribution, and the location-rate-shape family (when di = 3, d 2 = 0) under Gamma or 
other distributions. The combination of location parameter with covariance matrix, shape and rate 
parameters in mixture modeling enables rich and more accurate description of heterogeneity, but the 
interaction among varying parameter types can be complex, resulting in varied identifiability and con¬ 
vergence behaviors. In addition, we shall treat the settings of exact-htted mixtures and over-htted 
mixtures separately, as the later typically carries more complex behavior than the former. 

As shown by Nguyen, the convergence of mixture m odel parameters can be measured in terms of a 
Wassertein distance on the space of mixing measures G [Nguyen, 2013 1. Eet G = Yli=i Pi^{ 9 i,T,i) 

Go = Yli=i Pi^{e° s°) de two discrete probability measures on 0 x D, which is equipped with metric 










































p. Recall the Wasserstein distance of order r, for a given r > 1: 

Wr{G,Go) = , 

where the infimum is taken over all joint probability distributions q on [1 ,..., fc] x [1,..., fco] such 
that, when expressing q as a A: x fco matrix, the marginal constraints hold: ^ qij = pi and ^ qij = p'j. 

3 i 

Suppose that a sequence of mixing measures Gq under Wr metric at a rate ojn = o(l). If all 

Gn have the same number of atoms k = ko as that of Gq, then the set of atoms of Gn converge to 
the ko atoms of Gq at the same rate Un under p metric. If G„ have varying kn G [kQ , k] number of 
atoms, where A: is a fixed upper bound, then a subsequence of Gn can be constructed so that each atom 
of Go is a limit point of a certain subset of atoms of Gn — the convergence to each such limit also 
happens at rate ujn- Some atoms of Gn may have limit points that are not among Go’s atoms — the 
mass associated with those atoms of Gn must vanish at the generally faster rate wjj. 

In order to establish the rates of convergence for the mixing measure G, our strategy is to derive 
sharp bounds which relate the Wasserstein distance of mixing measures G, G' and a distance between 
corresponding mixture densities pg,Pg'^ such as the variational distance V{pg,Pg')- H is relatively 
simple to obtain upper bounds for the variational distance of mixing densities (V for short) in terms 
of Wasserstein distances Wr{G,G') (shorthanded by Wr). Establishing (sharp) lower bounds for V in 
terms of Wr is the main challenge. Such a bound may not hold, due to a possible lack of identifiability 
of the mixing measures: one may have pG = Pg'> so clearly V = 0 but G / G', so that Wr / 0. 


General theory of strong identifiability The classical identifiability condition requires th at pn = 
Pg' e ntails G = G'. This amounts to the linear independence of elements / in the density class IITeicheii. 


19631] . In order to establish quantitative lower bounds on a distance of mixture dens ities, we introduce 


several notions of strong identifiability, extending from the definition of IChenI 11 199511 to handle multiple 
parameter types, including matrix-variate parameters. There are two kinds of strong identifiability. One 
such notion involves taking the first-order derivatives of the function / with respect to all parameters in 
the model, and insisting that these quantities be linearly independent in sense to be precisely defined. 
This criferion will be called “sfrong idenfifiabilify in fhe firsf order”, or simply firsf-order idenfifiabil- 
ify. When fhe second-order derivafives are also involved, we obfain fhe second-order idenfifiabilify 
criferion. If is worfh noting fhaf prior sfudies on paramefer esfimafio n rafes fend to cenfer primarily fhe 
second-order identifiability condition or somet hing even stronger IChenL ll995L iLiu and ShaoL l2004i 


Rousseau and Mengersenl. 120 llLiNguyenL 120131] . We show that for exact-fitted mixtures, the first-order 


identifiability condition (along with some additional regularity conditions) suffices for obtaining that 


^(pg,PGo)>W^i(G',Go), (1) 

when IEi(G, Go) is sufficiently small. Moreover, for a broad range of density classes, we also have 
V < Wi, for which we actually obtain V{pg,PGq) ^ Wi{G,Gq). A consequence of this fact is that 
for any estimation procedure that admits the convergence rate for the mixture density under V 

distance, the mixture model parameters also converge at the same rate under Euclidean metric. 

Turning to the over-fitted setting, second-order identifiability along with mild regularity conditions 
would be sufficient for establishing that for any G that has at most k support points where A: > A:o -|- 1 
and k is fixed, 

V{pg,PGo)>WKG,Go). ( 2 ) 
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when W 2 {G, Gq) is sufficiently small. The lower bound Go) is sharp, i.e we can not improve 

the lower bound to Wl for any r < 2 (notably, W 2 > Wi). A consequence of this result is, take 
any standard estimation method (such that the MLE) which yields convergence rate for pc, 

the induced rate of convergence for the mixing measure G is the minimax optimal under W 2 - 

It also follows that the m i xing p robability mass converge at j-ate (which recovers the result of 


Rousseau and MengersenI 11201 lU l. in addition to showing that the component parameters converge at 


n 


-1/4 


rate. 


We also show that there is a range of mixture models with varying parameters of multiple types 
that satisfies the developed strong identifiability criteria. All such models exhibit the same kind of rate 
for parameter estimation. In particular, the second-order identifiability criterion (thus the first-order 
identifiability) is satisfied by many density families / including the multivariate Student’s t-distribution, 
the exponentially modified multivariate Student’s t-distribution. Second-order identifiability also holds 
for several mixture models with multiple types of (scalar) parameters. These results are presented in 
Section lT2l The proofs of these characterization theorems are rather technical, but one useful insight 
one can draw from them is that the strong identifiability condition (in either the first or the second order) 
is essentially determined by the smoothness of the kernel density in question (which can be expressed 
in terms of how fast the corresponding characteristic function vanishes toward infinity). 


Theory for weakly identifiable classes We hurry up to point out that many common density classes 
do not satisfy either or both strong identifiability criteria. The Gamma family of distributions (with 
both shape and scale parame ters vary) is not identif i able in the first order. N e ither is the family of 
skew-Gaussian distributions lAzzalini and CapitanioL 119991 lAzzalini and Vallel Il996ll . Convergence 
behavior for the mixture parameters of these two families are unknown, in both exact and over-fitted 
settings. The ubiquitous Gaussian family, when both location and scale/covariance parameters vary, is 
identifiable in the first order, but not in the second order. So, the general theory described above can 
be applied to analyze exact-fitted Gaussian mixtures, but not for over-fitted Gaussian mixtures. It turns 
out that these classes of mixture models require a separate and novel treatment. Throughout this work, 
we shall call such density families “weakly identifiable classes”, i.e., those that are identifiable in the 
classical sense, but not in the sense of strong identifiability taken in either the first or second order. 

Weak identifiability leads to an extremely rich (and previously unreported) spectrum of convergence 
behavior. It is no longer possible to establish inequalities ([Til and (H)), because they do not hold in 
general. Instead, we shall be able to establish sharp bounds of the types V > for some precise 
value of r, which depends on the specific class of density in consideration. This entails minimax 
optimal but non-standard rates of convergence for mixture model parameters. In our theory for these 
weakly identifiable classes, the algebraic structure of the density /, not merely its smoothness, will 
now play the fundamental role in determining the rates. 

Gaussian mixtures: We will first discuss the Gaussian family of densities of the standard form 
f{x\6,Tj), where 0 G and S G are mean and covariance parameters, respectively. The lack of 
strong identifiability in the second order is due to the following identity: 


d'^f df 

which entails that the derivatives of / taken with respect to the parameters up to the second order are 
not linearly independent. Moreover, this algebraic structure plays the fundamental role in our proof for 
the following inequality: 


V{pG,PGo)>w;iG,Go), 
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(3) 


















Density 

classes 

Exact-fitted mix¬ 
tures 

Over-fitted mix¬ 
tures 

MLE rate for G 
for n-iid sample 

Minimax lower 
bound for G 

(I) 

First-order 

identifiable 

Generalized 
Gaussian, 
Student’s t, ... 

V>Wi 


Exact-fit: 

ITi < n-T2 

Exact-fit: 

ITi > n-T2 

(ID 

Second- 

order 

identifiable 

Student’s t, 
exponentially 
modified 
Student’s t, ... 

same as (D 

T > IT^ 

Exact-fit: 
same as (D 

Exact-fit: 
same as (D 





Over-fit: 

IT 2 < 

Over-fit: 

ITi > n-T4 

Not 

second- 

order 

identifiable 

location-scale 

multivariate 

Gaussian 

same as (D 

V > it;, 

r depending on 
k — ko 

Exact-fit: 
same as (D 

Exact-fit: 
same as (D 




Ifk — ko — 1, r = 

4 

Ifk — ko = 2, r = 

6 

Over-fit: 

Wr < 

Over-fit: 

ITi > 


Gamma 

distribution 

Generic case: 
V>Wi 

Generic case: 

V > Wi 

Generic: ITi < 
or TT 2 < 

Generic: 

ITi > 

IT 2 > n-T4 



Patho. case: 

V ^ it; for any 
r > 1 

Patho. case: 

V ^ it; for any 
r > 1 

Patho. case: 

unknown 

Patho. case: loga¬ 
rithmic, i.e ITr ^ 

Vr > 1 

Not 

first-order 

identifiability 

Location- 

exponential 

distribution 



Unknown 

logarithmic 

ITi > n-i/’' 

Vr > 1 





Exact fit: 

Exact-fit: 



Generic case: 

T > ITi 

Generic case: 

V > W^, where 
m = f or f + 1 

Generic: 

ITi < n-T2 

Generic: 

ITi > n-T2 



Patho. confor¬ 

mant: 

V > wi 

Patho. confor¬ 

mant: 

unknown 

Patho. conformant: 
IT 2 < 

Patho. confor¬ 

mant: 

IT 2 > n-T4 


Skew- 

Gaussian 

distribution 

Patho. non- 

conformant: 

V > Wi for 
some s 

Patho. non- 

conformant: 

unknown 

Patho. non- 

conformant: 

IT^ < 

Patho. non- 

conformant: 

IT 3 > or 

IT 4 > or 

IT 5 > n-Tio, or 



Otherwise: 

V ^ W[ for any 
r > 1 

Otherwise: 

unknown 

Otherwise: 

unknown 

Otherwise: 

logarithmic 





Over-fit: 

„-l/2m 

or unknown 

Over-fit: 

unknown 


Table 1: Summary of results established in this paper. To be preeise, all upper bounds for MLE rates 
are of the form (log n/n)“'>', but the logarithmie term is removed in the table to avoid eluttering. 
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where r > 1 is defined as the minimum value of r > 1 such that the following system of polynomial 
equations 


k—ko+1 

E E 

j=l ni+2n2=a 


ni!n2! 


= 0 for all 1 < a < r 


does not have any non-trivial real solution {(cj, a^, We emphasize that the lower bound in 

Eq. (I3]l is sharp, in that it cannot be replaced by W[ (or W^) for any r < r. A consequence of this fact, 
by invoking standard results from asymptotic statistics, is that the minimax optimal rate of convergence 
for estimating G is under Wr distance metric. The authors find this correspondence quite 

striking - one which links precisely the minimax optimal estimation rate of mixing measures arising 
from an over-fitted Gaussian mixture to the solvability of an explicit system of polynomial equations. 

Determining the solvability of a system of polynomial equations is a basic question in (compu¬ 
tational) algebraic geometry. For the system described above, there does not seem to be an obvious 
answer to the general value of r. Since the number of variables in this system is 3(k — ko + 1), one 
expect s that r keeps incr easing as k — ko increases. In fact, using a standard method of Groebner 


bases iBuchbergeii. Il965ll . we can show that for /c — feo = 1 and 2, r = 4 and 6, respectively. In 


addition if k — ka > 3, then r > 7. Thus, the convergence rate of the mixing measure for over-fitted 
Gaussian mixture deteriorates very quickly as more extra components are included in the model. 

Gamma mixtures: We shall now briefly describe several other model classes studied in this paper. 
Gamma densities represent one such class: the Gamma density f{x\a, b) has two positive parameters, a 
for shape and b for rate. This family is not identifiable in the first order. The lack of identifiability boils 
down to the fundamental identity (ITOl) . By exploiting this identity, we can show that there are particular 
combinations of the true parameter values which prevent the Gamma class from enjoying strong con¬ 
vergence properties. By excluding the measure-zero set of pathological cases of true mixing measures, 
the Gamma density class in fact can be shown to be stro ngly identifiable in b oth orders. Thus, this class 
is almost strongly identifiable, using the terminology of I Allman et at. 1 1200911 . The generic/pathological 
dichotomy in the convergence behavior within the Gamma class is quite interesting: in the measure- 
one generic set of true mixing measures, the mixing measure can be estimated at the standard rate (i.e., 
under Wi for exact-fitted and n~^l^ under W 2 for over-fitted mixtures). The pathological cases 
are not so forgiving: even for exact-fitted mixtures, one can do no better than a logarithmic rate of 
convergence. 

Location-exponential mixtures: Lest some wonder whether this unusually slow rate for the exact- 
fitted mixture setting can happen only in the measurably negligible (pathological) cases, we also 
introduce a location-extension of the Gamma family, the location-exponential class: f{x\9,a) := 
^exp—^^l(x > 9). We show that the minimax lower bound for estimating the mixing measure 
in an exact-fitted mixture of location-exponentials is no faster than a logarithmic rate. 

Skew-Gaussian mixtures: The most fascinating example among those studied is perhaps skew- 
Gaussian distributions. This density class generalizes the Gaussian distributions, by having an extra 
parameter, shape, which controls density skewness. The skew-Gaussian family exhibits an extremely 
broad spectrum of behavior, some of which shared with the Gamma family, some with the Gaussian, 
but this family is really a league of its own. It is not identifiable in the first order, for a reason that is 
somewhat similar to that of the Gamma family described above. As a consequence, one can construct 
a full measure set of generic cases for the true mixing measures according to which, the exact-fitted 
mixture model admits strong identifiablity and convergence rate (as in the general theory). 

Within the seemingly benign setting of exact-fitted mixtures, the pathological cases for the skew- 
Gaussian carry a very rich structure, resulting in a variety of behaviors: for some subset of true mixing 
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measures, the convergence rate is tied to solvability of a certain system of polynomial equations; for 
some other subset, the convergence is poor - the rate can be logarithmic at best. 

Turning to over-fitted mixtures of skew-Gaussian distributions, unfortunately our theory remains 
incomplete. The culprit lies in the fundamental identity (ITSl) . which shows that the first and second 
order derivatives of the skew-Gaussian densities are dependent on a nonlinear manner. This is in 
contrast to the linear dependence that characterizes Gaussian and Gamma densities. Thus, the method 
of proof that works well for the previous examples is no longer adequate - the rates obtained are 
probably not optimal. 


Key proof ideas We now provide a brief description of our method of proofs for the results obtained 
in this paper, a summary of which given in Tabled] There are two different theories: a general the¬ 
ory for the strongly identifiable classes and specialized fheory for weakly idenfifiable classes. Wifhin 
each model classes, fhe key fechnical objective is fhe same: fo derive sharp inequalifies of fhe form 
^{pg^PGo) ^ (G) Go)> where sharpness is expressed in fhe choice of r. 

For sfrongly idenfifiable classes, eifher in fhe firsf or fhe second order, fhe sfarfing poinf of our 
proof is an application of Taylor expansion on fhe mixfure densify difference pG„ — PGq > where Gn 
represenfs a sequence of mixing measures fhaf fend fo Go in Wasersfein disfance Wr, where r = 1 or 
2, fhe assumed order of sfrong idenfifiablify. The main parf of fhe proof involves frying fo force all fhe 
Taylor coefficienfs in fhe Taylor expansion fo vanish according fo fhe converging sequence of G„. If 
fhaf is proved fo be impossib le, fhen one can arrive af fhe bound of fhe form V > WjT. Thus, our proof 
fechnique is similar fo fhaf of lNguvenl 11201311 . To show fhaf fhe derived inequalifies are sharp, we resort 
fo careful consfrucfions of a “worsf-case”sequence of Gn- 

For weakly idenfifiable classes, fhe Taylor expansion fechnique confinues fo provide fhe proof’s 
backbone, buf fhe key issue now is defermining fhe “correcf’order up fo which fhe Taylor expansion is 
exercised. Since high-order derivafives of fhe densify / are no longer independenf, fhe dependence has 
fo be faken info accounf before one can fall back fo a similar fechnique afforded by fhe general fheory 
described above. If fhe high-order derivafives are linearly dependenf, as is fhe case of Gaussian densi- 
fies, if is possible fo reduce fhe original Taylor expansion in terms of only a subset of such derivative 
quantities that are linearly independent. This reduction process paves the way for a system of poly¬ 
nomial equations to emerge. It follows then that the right exponent r in the desired bound described 
above can be linked to the order of such a system which admits a non-trivial solution. 


Practical implications Problematic convergence behaviors exhibited by widely utilized models such 
as Gaussian mixtures may have long been observed in practice, but to our knowledge, most of the 
obtained convergence rates are established for the first time in this paper, particularly those of weakly 
identifiable classes. The resulfs esfablished for fhe popular Gaussian class presenf a formal reminder 
abouf fhe limifafion of Gaussian mixfures when if comes fo assessing fhe qualify of paramefer esfi- 
mafion, buf only when fhe number of mixing componenfs is unknown. Since a fendency in practice 
is fo “over-fif” fhe mixfure generously wifh many more exfra mixing componenfs, our fheory warns 
againsf fhis practice, because fhe convergence rafe for subpopulafion-specific paramefers deferiorafes 
rapidly wifh fhe number of redundanf componenfs. In particular, we expecf fhaf fhe value r in fhe rafe 
77 ,-i/ 2 »' tends fo infinify as fhe number of redundanf Gaussian componenfs increases fo infinify. To 
complefe fhe specfrum of rafes, we nofe fhe logarifhmic rafe (log of cotiyergen ce of fhe mixing 

measure in infi nife Gaussian Iqcafiq n mixfures, via a Bayes esfimafe iNguvenl 1201311 or kernel-based 
deconvolution iCaillerie et al.L 1201111 . _ 


F or Gamma and skew -Gaussian mixfures, (for applications, see, e.g. IIGhosal and RoyLl201lLlLee and McLachlanl 


20131 IWiper ef al.L 1200111 1 our fheory painfs a wide specfrum of convergence behaviors wifhin each 
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model class. We hope that the theoretical results obtained here may hint at practically useful ways 
for determining benign scenarios when the mixture models enjoy strong identifiability properties and 
favorable convergence rates, and for identifying pathological scenarios where the practioners would do 
well by avoiding them. 

Paper organization The rest of the paper is organized as follows. Section|2] provides some prelimi¬ 
nary backgrounds and facts. Section [3] presents a general theory of strong identifiability, by addressing 
the exact-fitted and over-fitted settings separately before providing a characteration of density classes 
for which the general theory is applicable. Section |4] is devoted to a theory for weakly identifiable 
classes, by freafing each of fhe described fhree density classes separately. Section [5dl confains easy 
consequences of fhe fheory developed earlier - Ibis includes minimax bounds and fhe convergence 
rates of fhe maximum likelihood esfimafion, which are opfimal in many cases. The fheorefical bounds 
are illusfraled via simulations in Secfion [S!2l Self-conlained proofs of represenfafive fheorems are given 
in Section 0 while proofs of remaining resulfs are presented in fhe Appendix. 


Notation Divergence distances studied in this paper include the total variational distance V {pgiPG' ) = 
^ J |pG(a^)the Hellinger distance h'^ipG,PG') = ^ 

As K,L G N, the first derivative of real function g : ^ M of matrix S is defined as a K x L 


{Vpg{x) - y/pG'ix)fdp{x). 


d'^g . 


matrix whose {i,j) element is dg/dT^ij. The second derivative of g, denoted by 

d 

matrix made of KL blocks of AT x L matrix, whose (i, j)-block is given by 

u2-i \ u2^ 


5S2 

dg 


is a X L2 


. Addition¬ 


ally, as A" G N, for function g 2 : x —>• 

the vector component and matrix component ^ 


A-columns, whose (i, j)-block is given by 


dJ^de 

d / dg2 


defined on (0, S), the joint derivative between 

o2 „ 

^ is a (AA) X L matrix of AL blocks for 


de \ 




. Finally, for any symmetric matrix S G 


Ai(i;) and Ad(S) respectively denote its smallest and largest eigenvalue. 


2 Preliminaries 


First of all, we need to define our notion of distances on the space of mixing measures G. In this 
paper, we restrict ourself to the space of discrete mixing measures with exactly ko distinct support 
points on 0 X D, which is denoted by x D), and the space of discrete mixing measures with 

at most k distinct support points on 0 x D, which is denoted by 0^(0 x D). In addition, let ^(0 x 

Q) = U £k(& X D) be the set of all discrete measures with finite support points. Consider mixing 

keN 


k 

measure G = where p = {pi,P 2 , ■ ■ ■ ,Pk) denotes the proportion vector and { 0 , S) = 

i=l 


{{ 01 , Si),..., {9k, Sfc)) denotes the supporting atoms in 0 x D. Likewise, let G' = Yli=iPi^{e',^')- 
A coupling between p and p' is a joint distribution q on [1..., /c] x [1,..., k'], which is expressed as a 

k k' 

matrix q = {qij)i<i<k,i <j<k £ [0,and admits marginal constraints ^ qij = p'- and Yh qij = Pi 

i=l j=l 

for any i = 1,2,... ,k and j = 1,2,... , k'. We call q a coupling of p and p', and use Q{p,p') to 
denote th e space o f all su ch couplings. 

As in iNguvenI 11201311 . our tool for analyzing the identifiability and convergence of parameters in 
a mixture model is by adopting Wasserstein distances, which can be defined as the optimal cost of 












moving mass from one probability measure to another lIVillaniL 1200911 . For any r > 1, the r-th order 
Wasserstein distance between G and G' is given by 


Wr{G,G') = { inf Vg,,(||0,-0'|| + ||s,-s;.||r) 

\qeQ(p,p')^ ) 

^1j 


1/r 


In both equations in the above display, || • || denotes either the I 2 norm for elements in or the 
entry wise I 2 norm for matrices. A central theme of the paper is the relationship between the Wasserstein 
distances of mixing measures G, G' and distances of corresponding mixture densities pg,Pg'- Recall 
that mixture density pG is obtained by combining a mixing measure G £ G{Q x Q) with a family of 
density functions {f{x\9, S), 0 G 0, S G fl}: 


Pg{x) = / /(x|0,S)(iG(0,S) = 


■' i=l 

Clearly if G = G' then pG = Pg'- Intuively, if FFi(G, G') or W 2 {G, G') is small, so is a distance 
between pG and pG'- This can be quantified by establishing an upper bound for the distance of pG 
and pg' in terms of FFi(G, G') or W 2 {G, G'). A general notion of distan ce between probability densi¬ 
ties defined on a common space is /-divergence (or Ali-Silvey disfance) I Ah and SilvevI 119661] : an /- 

divergence between two probabil.ty dens.ty funedons / and g is deflned as pg (/,g) = / ^ (^) /d,, 

where : M —M is a convex funchon. Similarly, fhe /-divergence befween pG and pG' is {pG, Pg' ) = 

f 4>\ ) PGdp- As 4>ix) = -{^/x — 1)^, we obtain the squared Hellinger distance (p| = /i^). As 

\PG J 2 

0(x) = -|x — 1|, we obtain the variational distance (pv = V). 

A simple way of establishing an upper bound for an /-divergence between pG and pG> is via the 
“composite transportation distance” between mixing measures G, G': 

d,,(G,G')= inf Vqg,p4/g,/') 

q€Q{p,p ) ^ 

where fi = /(x|0i, Sj) and /j = f{x\9j, S' ) for any i,j- The following inequality regarding the rela¬ 
tionsh ip between p^{pg^Pg') dp^{G, G') is a simple consequence of Jensen’s inequality iNguven . 

2()Ii : 

P<I>{PG,PG') < dp^iG,G'). 

It is straightforward to derive upper bounds for dp^{G, G') in terms of Wasserstein distances IT/, by 
taking into account specific sfrucfures of fhe densify family /, and fhen combine wifh fhe inequalify in 
fhe previous display fo arrive af upper bounds for p^(pG^PG') in terms of Wassersfein disfances. Here 
are a few examples. 


Example 2.1. (Multivariate generalized Gaussian distribution kzhane et al\ 201^1 ) 

The density family / takes the form f{x\9, m, S) = ^d/2Y(d/{2m))\J:fG “ 0)'^S“^(x - 

0))™), where 0 G m > 0, and S G If&i is bounded subset o/M/ 02 = {m G M+ : 1 < m 
< m < rn}, and Tl = |s G : A < < y/ Ad(S) < a|, where A, A > 0, then for any 

Gi,G 2 G G {&1 X 02 X n), we obtain h^{pGi,PG 2 ) ^ W^{Gi,G 2 ) and l/(pGi,PG 2 ) ^ fEi(Gi,G 2 ). 
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Example 2.2. (Multivariate Student’s t-distribution) 

The density famUy f takes the form f(x\9,T,) = Ci,{v + (a: — 6)'^Ti~^(x — where v 


is a fixed positive degree of freedom and = 


If 0 is bounded subset of ] 


and Tl = 


|s G S'^~^ : A < a/ Ai(E) < < a|, then for any Gi, G 2 G Q{& x O), we obtain 


h\pGi,PG2) ^ W 2 (<^ 1 ,^ 2 ) andV{pGi,PG2) ^ lEi(Gi,G2). 


Example 2.3. (Exponentially modified multivariate Student’s t-distribution) 

Let /(x|0, A, S) to be density function of X = Y Z, where Y follows multivariate t-distribution with 
location 8, covariance matrix S, fixed positive degree of freedom v, and Z is distributed by the product 
of d independent exponential distributions with combined shape A = (Ai,..., A^^). If Q is bounded 
subset ofW^ X where = |x G : x* > 0 Vi}, and = |s G S’j”'' : A < 1 / Ai(S) < 

\/Arf(S) < a|, thenforany Gi,G2 G ^(0 x Q), h‘^{pGi,PG2) ^ Wi{Gi,G2) and V{pgi,PG2) ^ 
Wi{Gi,G2). 


Example 2.4. (Modified Gaussian-Gamma distribution) 

Let f(x\8, X, (3,T,) to be density function of X = Y -\- Z, where Y is distributed by multivariate Gaus¬ 
sian distribution with mean 9, covariance matrix T,, and Z is distributed by the product of independent 
Gamma distributions with combined shape vector a = (ai,... , 0 ;^) and combined rate vector j3 = 
(fii,fid)- IfQ bounded subset ofMfi x Rj} x R^} and Ll = |s G : A < -y/Ai(^ < 1 /Arf(S) 
< A), thenforany Gi,G 2 G 0(0 x n), h‘^{pGi,PG 2 ) ^ y{PGi,PG 2 ) ^ Wi{Gi,G 2 ). 


3 General theory of strong identifiability 


The objective of this section is to develop a general theory according to which a small distance between 
mixture densities pG and pc entails a small Wasserstein distance between mixing measures G and G'. 
The classical identifiability criteria requires that pQ = pQi entail G = G', which essentially equiva¬ 
lent to a linear independence requirement for the class of density family {f(x\9, S)|0 G 0, S G fl}. 
To obtain quantitative bounds, we need stronger notions of identifiability, ones which involve higher 
order derivatives of density function /, taken with respect to the multivariate and matrix-variate pa- 
ra meters present in th e mixture mo del. The advantage of this theory, which extends from the work 
of lNguvenl 120131] and IChenI 119951] . is that it is holds generally for a broad range of mixture models, 
which allow for the same bounds on the Wasserstein distances of mixing measures to hold. This in 
turn leads to “standard” rates of convergence for the mixing measure. On the other hand, many popular 
mixture models such as the location-covariance Gaussian mixture, mixture of Gamma, and mixture of 
skew-Gaussian distributions do not submit to the general theory. Instead they require separate and fun¬ 
damentally distinct treatments; moreover, such models also exhibit non-standard rates of convergence 
for the mixing measure. Readers interested in results for such models may skip directly to Section IH 


3.1 Definitions and general bounds 

Definition 3.1. The family {f{x\9, S), 0 G 0, S G 12} is identifiable in the first-order if f{x\9, S) is 
differentiable in (0, S) and the following assumption holds 

AT For any finite k different pairs (0i, Si),..., {9k, S^) G 0 x 12, if we have ai G R, /3i G R'^^ and 
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symmetric matrices 7* E ]K'^2X(i2 all i = 1,... ,k) such that 

^ai/(x|6»i,Sj) + / 3 f^(x| 6 'i,Si) + tr = 0 for almost all x 

then this will entail that a* = 0, / 3 i = 0 E , 7^ = 0 E R‘^2 xd2 z = 1,..., /c. 


Remark. The condition that 7 ^ is symmetric in Definition 13.11 is crucial, without which the iden- 

df 

tifiability condition would fail for many classes of density. For instance, assume that 


are symmetric matrices for all i (this clearly holds for any elliptical distributions, such as multivariate 
Gaussian, Student’s t-distribution, and logistics distribution). If we choose 7* to be anti-symmetric 
matrices, then by choosing a* = 0, /3j = 0, {'yi)uu = 0 for all 1 < n < d2 (i.e. all diagonal elements 
are 0), the equation in condition A.l holds while 7* can be different from 0 for all i. 

Additionally, we say the family of densities / is uniformly Lipschitz up to the first order if the 
following holds: there are positive constants 81,82 such that for any Ri,R2,R3 > 0, 71 E 
72 G Ri < < R2, || 0 || < R3, 01,02 G e, Si,S2 G n, there are 

positive constants C{Ri,R2) and C(i? 3 ) such that for all x E 


and 




<c{Ri,R2)\\0i-e2\\^^\hi 


(4) 


tr 




(x|0,Si) 





<C(i?3)||Si-S2||'5^||72 


(5) 


First-order identifiability is sufficient for deriving a lower bound of V{pgtPGo ) terms of VFi {G, Go), 
under the exact-fitted setting: This is the setting where Go has exactly support points, k^ known: 


Theorem 3.1. (Exact-fitted setting) Suppose that the density family f is identifiable in the first order 
and admits uniform Lipschitz property up to the first order. Then there are positive constants eo and 
Go, both depending on Go, such that as long as G G £ko{& x tind VFi(G, Go) < eo, we have 


y{PG,PGo) > GoVFi(G, Go). 


Note that we do not impose any boundedness on 0 or D. Nonetheless, the bound is of local nature, 
in the sense that it holds only for those G sufficiently close to Go by a Wassertein distance at most eo, 
which again varies with Go. It is possible to extend this type of bound to hold globally over a compact 
subset of the space of mixing measures, under a mild regularity condition, as the following corollary 
asserts: 


Corollary 3.1. Suppose that the density family f is identifiable in the first order, and admits uniform 
Lipschitz property up the first order. Further, there is a positive constant a > 0 such that for any 
Gi,G 2 E ^fco (0 X D), we have V {pgi-,PG 2 ) ~ (^1)^2)- Then, for a fixed compact subset Q of 

^fco(0 ^ ^)’ is a positive constant Gq = Go(Go) such that 

V{pG,PGo) > CoWiiG, Go) for all GgQ. 

We shall verify in the sequel that the classes of densities / described in Examples 12.1112.2112.31 and 
12.41 are all identifiable in the first order. Thus, a remarkable consequence of the result above is that for 
such classes of densities, the variational distance V on mixture densities and the Wasserstein distance 
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Wi on the corresponding mixing measures are in fact equivalent in the exact-fitted setting. That is, 
when G share the same number of support points as that of Go, we have 

VipG,PGo)-WiiG,Go) 


Moving to the over-fitted setting, where Go has exactly /cq support points lying in the interior of 
0 x 0 , but /cq is unknown and only an upper bound for ko is given, a stronger identifiability condition is 
required. This condition involves the second-ord er der i yative s of the density c lass / that extends from 
the notion of strong identifiability considered by ChenI 1 1995 1. Nguven 1 2013 1: 

Definition 3.2. The family {f{x\9, S), 0 G 0, S G 0} is identifiable in the second-order if f{x\6, S) 
is twice differentiable in [6, S) and the following assumption holds 

A2. For any finite k different pairs {6i, Si),..., {9^, S^) G 0 x 0, if we have ai G M, fii, Vi G 
ji,rii symmetric matrices in M'^ 2 xd 2 as i = 1,... ,k such that 

^[aif{x\9i,'Fi) + l3j^{x\9i,i:i) + vf^{x\9i,T.i)ui + 

2 = 1 ^ 


tr 


9S 


+2v,^ 


d 




hi 


9S 


9S 


df, 




9S 


hi 


+ 


= 0 for almost all x, 


then this will entail that ai = 0, fii = ai = 0 G 


7* 


= r/i = 0 G for i = 1,... ,k. 


In addition, we say the family of densities / is uniformly Lipschitz up to the second order if the 
following holds: there are positive constants < 53,(54 such that for any > 0, 71 G 

72 G < R 5 , ll^ll < Rtt, 01,92 G 0, Si,S 2 G Q, there are 

positive constants Gi depending on (i? 4 , f?5) and G 2 depending on Rq such that for all x G .T 

a2 f pp. f 

and 


tr 


A 

5S 


(lr(|^(i|9,E,)72)) 


A 

5S 


(tr (^{x\0,'R‘2fl^ 



< 


G2||Si-S2||^1l72||i 


Let k>2 and /cq > 1 be fixed posifive infegers where fc > fco -|- 1. Go G £ko while G varies in Ok- 
Then, we can esfablish fhe following resulf 

Theorem 3.2. (Over-fitted setting) 

(a) Suppose that the density family f is identifiable in the second order and admits uniform Lipschitz 
property up to the second order. Moreover, 0 is bounded subset ofW^^ and Ft is subset of 
such that the largest eigenvalues of elements of FI are bounded above. In addition, suppose that 
lim f{x\9, S) = 0/or all x ^ X and 0 G fl. Then there are positive constants cq and Gq 

Ai(S)^.0 

depending on Gq such that as long as W 2 (G, Gq) < eo, 

V{pG,PGo)>CoWi{G,Go). 
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(b) (Optimality of bound for variation distance) Assume that f is second-order differentiable with 

0“^ f 

^ -(x|0,S) 


respect to 9, Ti and sup 

0G0,Sef2 






di 


dx < oo for all ai = G 


«2 = ioi‘f^v)i<u,v<d 2 £ N‘^ 2 xd 2 ^ aj + Y1 ^uv = 2. Then, for any 1 < r < 2; 

i=l l<u,,v<d2 


lim inf 
e-s-OGeOfc(0xr2) 


V{pg,Pg,)IW[{G,Go) : Wi{G,Go) < e| = 0. 


(c) (Optimality of bound for Hellinger distance) Assume that f is second-order differentiable with 
respect to 9, S and we can find cq sufficiently small such that 


sup 

||0-0'|| + ||E-S'||<co 


/ 

x&X 


d^f 


(x|0,S) 


/f{x\9 , S )dx < oo, 


where ai, a 2 are defined as that of part (b). Then, for any 1 < r < 2; 


lim inf 

e^OGeOfc(0xr2) 


h{pG,PG,)/W{{G,Go) : Wi(G,Go) < e| = 0. 


( 6 ) 


Here and elsewhere, the ratio VjWr is set to be oo if Wr{G, Gq) = 0. We make a few remarks. 

(i) ' 


A cou nterpart of part (a) for finite mixtures with multivariate parameters was given in [Nguyen 
1 201311 (Proposition 1). The proof in that paper has a problem: it relies on Nguyen’s Theorem 1, 
which holds only for the exact-fitted setting, but not for the over-fitted setting. This was pointed 
out to the second author by Elisabeth Gassiat who attributed it to Jonas Kahn. Fortunately, this 
error can be simply corrected by replacing Nguyen’s Theorem 1 with a weaker version, which 
holds for the over-fitted setting and suffices for our purpose, for which his mefhod of proof 
confinues fo apply. For par! (a), if suffices fo prove only fhe following weaker version: 


lim inf 

e->OGeOfc(0xr2) 


V{pG,PGo)/Wi{G,Go) : W 2 {G,Go) < e| > 0. 


(ii) The mild condition lim f{x\9, S) = 0 is imporfanf for fhe mafrix-variafe parameter S. In 

Ai(S)^0 

particular, if is useful for addressing fhe scenario when fhe smallesf eigenvalue of mafrix param- 
efer S is nof bounded away from 0. This condifion, however, can be removed if we impose fhaf 
S is a positive definile mafrix whose eigenvalues are bounded away from 0. 

(iii) Par! (b) demonsfrafes fhe sharpness of fhe bound in part (a). In parficular, we cannof improve fhe 

lower bound in par! (a) fo any quantify Wf(G, Gq) for any r < 2. For any esfimafion mefhod 
fhaf yields convergence rate under fhe Hellinger disfance for pG, pari (a) induces 

convergence rafe under W 2 for G. Pari (c) implies fhaf is minimax opfimal. 

(iv) The boundedness of 0, as well as fhe boundedness from above of fhe eigenvalues of elemenfs of 
n are bolh necessary conditions. Indeed, if is possible fo show fhaf if one of Ihese Iwo conditions 
is nof mel, if is nof possible fo oblain fhe lower bound of V{pg,PGo) as eslablished, because 
disfance h> V can vanish much faster fhan Wr{G, Gq), as can be seen by: 
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Proposition 3.1. Let Q be a subset ofW^^ and Ll = Then for any r > 1 and jd > Qwe have 


lim inf I exp —-- /i(pg,PGo) : WriG^Go) < e > = 0. 

e^0GeOk{exn)[ \Wr^{G,Go)) o/ v y j 

As in the exact-fitted setting, in order to establish the bound V > W 2 globally, we simply add a 
compactness condition on the subset within which G varies: 

Corollary 3.2. Assume that 0 and Q. are two compact subsets and respectively. Suppose 

that the density family f is identifiable in the second order and admits uniform Lipschitz property up to 
the second order. Further, there is a positive constant a < 2 such that for any Gi, G2 G Ofc (0 x 0 ), 
we have V{pg.^,Pg2) ^ ^^^(^1,^2). Then for a fixed compact subset O ofOk{Q x fi) there is a 
positive constant Gq = C'o(Go) such that 

V{pG,PGo) > GoWiiG,Go) for all Ge O. 


3.2 Characterization of strong identifiability 

In this subsection we identify a broad range of density classes for which the strong identifiability 
conditions developed previously hold either in the first or the second order. Then we also present a 
general result which shows how strong identifiablity conditions continue to be preserved under certain 
transformations with respect to the parameter space. 

First, we consider univariate density functions with parameters of multiple types: 


Theorem 3.3. (Densities with multiple varying parameters) 


(a) Generalized univariate logistic density function: Let f(x\9,a) := —f{{x—0)/a), where f{x) = 

a 

(p _j_ q ^ GXp fpX 1 

—— - - , and p,q are fixed positive integers. Then the family {f {x\6, a), 9 G K 

T{p)T{q) (1 + exp(x))P+5 

a G M_|_} is identifiable in the second order. 

(b) Generalized Gumbel density function: Let f{x\9, a, A) := —f{{x — 9)ja, A), where f{x, A) = 

a 

A^ 

■ exp(—A(x+exp(—x))) <35 A > 0. Then the family {f{x\9, a, A), 0 G M, a G M+, A G M+} 


r(A) 

is identifiable in the second order. 

(c) Univariate Weibull distribution: Let fx{x\u,X) = 


V (X 


A VA 


u-l 


exp 


, for X > 0, where 


i',X > 0 are shape and scale parameters, respectively. Then the family {fx{x\v, A), i/ G M_|_, A G 
is identifiable in the second order. 

(d) Von Mises distributions kMardia 1975 Hsu et ail 1981 . Kent 1983 1: Denote /(x|/i, k) = 

———— exp(Kcos(x — /^))-l{3;G[o,27r)}. where p G [ 0 , 27 r),«: > 0 , and Io{n) is the modified 
ZTTio (^) 

Bessel function of order 0. Then the family {f{x\p,K),p G [ 0 , 27 r),K G K+} is identifiable in 
the second order. 


Next, we turn to density function classes with matrix-variate parameter spaces, as introduced in 
Section 121 

Theorem 3.4. (Densities with matrix-variate parameters) 
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(a) The family |/(x| 0 , S, m),6 E S E S’j”’", m > l} of multivariate generalized Gaussian dis¬ 
tribution is identifiable in the first order. 

(b) The family |/(x| 0 , S), 0 E S E Sj”’'} of multivariate t-distribution with fixed odd degree of 
freedom is identifiable in the second order. 

(c) The family |/(x|0, S, A), 6 E S E A E M^} of exponentially modified multivariate t- 
distribution with fixed odd degree of freedom is identifiable in the second order. 

(d) The family |/(x|0, S, a, 5), 0 E S E •S’j”’", a E E d > 2} o/ modified multivari¬ 

ate Gaussian-Gamma distribution is identifiable in the first order. 


We note that thes e theorems ar e quite similar to Chen’s analysis on classes of density with single 
parameter spaces (cf. IChenI 1 199511 A The proofs of these results, however, are technically nontrivial 
even if conceptually somewhat straightforward. For the transparency of our idea, we only demonstrate 
the results in Theorem 13.31 and Theorem 13.41 up to the first-order identifiability. The proof technique 
for the second-order identifiability is similar. They are given in the Appendices. As can be seen 
in these proofs, the strong identifiability of these density classes are established by exploiting how 
the corresponding characteristics functions (i.e., Fourier transform of the density) vanish at infinity. 
Thus it can be concluded that the common feature in establishing strong identifiability hinges on the 
smoothness of the density / in question. (It is interesting to contrast this with the story in the next 
section, where we shall meet weakly identifiable density classes whose algebraic sfructures play a 
more significant role in our theory). 

We also add several technical remarks: Regarding part (a), we demonstrate in Proposition 14. 1 H ater 
that the class of multivariate Gaussian or generalized Gaussian distribution is not identifiable in fhe 
second order. The condition odd degree of freedom in par! (b) and (c) of Theorem [Td] is mainly due fo 
our proof technique. We believe both (b) and (c) hold for any fixed positive degree of freedom, but do 
not have a proof for such setting. 

Before ending this section, we state a general result which is a response to a question posed by 
Xuming He on the identifiability in transformed parameter spaces.. The following theorem states that 
the first-order identifiability with respect to a transformed parameter space is preserved under some 
regularity conditions of the transformation operator. Let T be a bijective mapping from 0* x Q* to 
0 X H such that 

r(tyA) = (ri(tyA),r2(r?,A)) = (0,s) 


for all (?7, A) E 0* X Q*, where 0* 
{g{x\ri,A),ri E 0*,A E 0*} by 


c n* 




Define the class of density functions 


g{x\r],A) := f{x\T{g,A)). 

Additionally, for any (r/, A) E 0* x Tl*, let J(r?, A) E be the modified Jacobian 

matrix of T{r], A), i.e. the usual Jacobian matrix when (r/, A) is taken as a di -|- d| vector. 

Theorem 3.5. Assume that {f{x\9, S), 0 E 0, S E D} is identifiable in the first order. Then the class 
of density functions {g{x\g., A),g E 0*,A E D*} is identifiable in the first order if and only if the 
modified Jacobian matrix J{g, A) is non-singular for all (rj, A) E 0* x Q*. 

The conclusion of Theorem [33] still holds if we replace the first-order identifiability by the second- 
order identifiability. As we have seen previously, strong identifiablity (either in the first or second order) 
yields sharp lower bounds of V(pgiPGo) in terms of Wasserstein distances Wr{G, Gf). It is useful to 
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know that in the transformed parameter space, one may still enjoy the same inequality. Specifically, for 
any discrete probability measure Q = Yli=i £ £k{Q* x denote 

/ k 

g{x\ri,A)dQ{ri,A) = '^pig{x\rii,Ai). 
i=i 

Let Qo to be a fixed discrete probability measure on x fl*), while probability measure Q varies 

inSkoiO* xn*). 

Corollary 3.3. Assume that the conditions ofTheorem \3.5\ hold. Further, suppose that the first deriva¬ 
tive of f in terms of 6,Ti and the first derivative ofT in terms ofp,A are a-Holder continuous and 
bounded where a > 0. Then there are positive constants cq := eo(Qo) tind Cq := Cq{Qo) such that as 
long as Q £ £k{&* x fl*) and Wi{Q, Qo) < cq, we have 

V{p'q,p'q,)>CoWi{Q,Qo). 

Remark. If 0 and are bounded sets, the condition on the boundedness of the first derivative of / 
in terms of 9, S and the first derivative of g in terms of rj, A can be left out. Additionally, the restriction 
that these derivatives should be a-Hdlder continuous can be relaxed to only that the first derivative of 
/ and the first derivative of g are ai-Hdlder continuous and a! 2 -Hdlder continuous where ai,a 2 > 0 
can be different. 


4 Theory for weakly identifiable classes 

The general theory of strong identifiability developed in the previous section encompasses many classes 
of distributions, but they are not applicable to some important classes, those that we shall call weakly 
identifiable classes of distributions. These are the families of densities that are identifiable in the classi¬ 
cal sense in a finite mixture setting, but they do not satisfy the strong identifiability conditions we have 
defined previously. Such classes of densities give rise to the ubiquitous location-covariance Gaussian 
mixture, as well as mixture of Gamma distributions, and mixture of skew-Gaussian distributions. We 
will see that these density classes carry a quite varied and fascinating range of behaviors: the spe¬ 
cific algebraic structure of the density class in question now plays the fundamental role in determining 
identifiability and convergence properties for model parameters and the mixing measure. 


4.1 Over-fitted mixture of location-covariance Gaussian distributions 


Location-covariance Gaussian distributions belong to the broader class of generalized Gaussians (cf. 
Example 12.11) . which is identifiable in the first order according to Theorem 13.41 The class of location- 
covariance Gaussian distributions, however, is not identifiable in the second order. This implies that in 
the over-fitted mixture setting, Theorem l3.2l is not applicable. 

In this section the multivariate Gaussian densities {/(x|0, S), 0 G S G is defined in the 


usual way, i.e., /(x|0,S) = 


exp(—(x — ^{x — 9)/2). (Note that the scaling 


(27r)<^/2|S|V2 

in the exponent slightly differs from the version given in Example 12.11 where Gaussian distribution 
corresponds to setting m = 1, but this discrepancy is inconsequential). In fact, using the same approach 
as the proof of Theorem 13.41 we can verify that for any fixed positive number m > 1, the class of 
generalized Gaussian distributions is also identifiable in the second order. So within this broader family, 
it is essentially only the class of Gaussian distributions with both location and covariance parameters 
varying that is weakly identifiable. 
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Proposition 4.1. The family |/(x|0,S),0 G S G of multivariate Gaussian distribution is 

not identifiable in the second order. 

Proof The proof is immediate thanks to the following key identity, which holds for all 0 G and 

S G 

f df 

^(a;|0,S) = 2^(x|0,E). (7) 

This identity is stated as Lemma 17.11 whose proof is given in the Appendix. Now, by choosing ctj = 

0 G M, /3i = 0 G ry* = 0 G and 2i>ivJ + 7 * = 0 for all 1 < i < k, the equation given in 

[A2.] of Definition 13.21 is clearly satisfied for all x. Since Ui and 7 ^ need nof be 0, fhe second-order 
idenlifiabilify does nof hold. □ 

Idenfify (17]l is fhe reason fhaf sfrong idenfifiabilify fails for over-fifled locafion-scale mixfure of 
Gaussians. We shall see fhaf if also provides fhe key for uncovering fhe precise convergence behavior 
of fhe mixing measure in fhe over-filled Gaussian mixfure model. 

Lei Go be a fixed probabilily measure wilh exaclly ko support poinls, 0 is bounded subsel of 
and D is subsel of where fhe largesf eigenvalue of fheir elemenls are bounded above. Lef G 
vary in fhe larger sel Ok{& x D), where k > k^ + 1. We shall no longer expecl bounds of fhe kind 
V > W 2 such as Ihose eslablished by Theorem 13.21 In facl, we can oblain sharp bounds of fhe type 
^{pg^PGo) ^ {G,Go), where r is determined by fhe (in)solvability of a system of polynomial 

equafions fhaf we now describe. 

For any fixed k,ko > 1 where k > ko + 1, we define r > 1 lo be fhe minimum value of r > 1 such 
fhaf the following system of polynomial equations 


/c—/ cq+I 

E E 

j=l ni+2n2=Q 
ni,n2>0 


niln2l 


= 0 for each a = 1 , 


r 


( 8 ) 


does not have any non-trivial solution for the unknowns (ci,... ,Ck-ko+i,ai ,..., Ofc-fcQ+i, 61 ,... ,bk-ko+i)- 
A solution is considered non-trivial if ci,..., Ck-ko+i differ from 0 and at least one of oi,..., Ok-ko+i 
differs from 0 . 


Remark. This is a system of r polynomial equations for 3(A: — ko + 1) unknowns. The condition 
Cl,..., Ck-ko+i / 0 is very important. In fact, if ci = 0, then by choosing ai / 0, Oj = 0 for all 2 < 

fc-feo-i-i 

i < k — k^ + l and bj = 0 for all 1 < j < k — kQ + 1 , we can check that —~ — 0 

j = l ni+ 2 n 2 =a rulnfi 
ni,n2>0 

is satisfied for all a > 1. Therefore, wilhouf Ihis condifion, r does nof exisl. 

Example. To gef a feel for fhe sysfem of equafions (HJl, lef us consider fhe case k = k^ + 1, and lef 
r = 3. Then we obfain fhe equafions: 


c^ai -|- c|a2 = 0 

“(ciOi -|- C 2 CI 2 ) T Ci^i -|- C 262 = 0 

— (ciof -|- 0202) -|- CiOifti -|- C2O262 = 0. 
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It is simple to see that a non-trivial solution exists, by choosing C 2 = ci / 0, ai = 1 ,02 = —1, 61 = 
62 = —1/2. Hence, r > 4. For r = 4, the system consists of the three equations given above, plus 

— (cfaf + C 2 a|) + -^{clalbi + 0202^2) + + ^2^2) = 0 


It can be shown in the sequel that this system has no non-trivial solution. Therefore for k = ko + 1, 
we have r = 4. Determining the exact value of r in the general case appears very difficult. Even for 
the specific value of /c — ko, finding r is nof easy. There are well-developed mefho ds in compufafiona l 
algebra for dea hng wifh this fyp e of polynomial equations, such as Groebner bases iBuchbergen. 1196511 
and resultants iSturmfelsL 1200211 . Using the Groebner bases method, we can show that: 


Proposition 4.2. (Values of r) 


(i) If k — ko = 1, r = 4. 

(ii) If k — ko = 2, r = 6. 

(Hi) Ifk — ko > 3, r > 7. 


Remark. The results of this proposition appear to suggest that that r = 2{k — ko + 1). We leave this 
as a conjecture. 

The main result for this section is a precise relationship between the identifiability and conver¬ 
gence behavior of mixing measures in an over-fitted Gaussian mixture with the solvability of system of 
equations ([ 8 ]l. 

Theorem 4.1. (Over-fitted Gaussian mixture) Let r be defined in the preceeding paragraphs. 

(a) For any 1 < r < r, there holds: 

hm inf \hipG,PGo)/W^iG,Go):Wi{G,Go)<e]=0. (9) 

e->OGGOfc(exr2) ( J 


(b) For any cq > 0, define = j E 0^(0 x D) : pi > cq V 1 < z < A:* 


2 = 1 


Then, for G E Ok,co{Q x D) and Wr{G, Go) sufficiently small, there holds: 


ViPG,PGo) > Wf{G,Go) > W!(G,Go). 


We make several remarks. 

(i) Close investigation of the proof of part (a) and part (b) together shows that iy2;(G, Go) is the 
sharp lower bound for the distance of mixture densities h{pG,PGo) > ^{pGiPGo) when co is 
sufficiently small. In particular, we cannot improve the the lower bound to Wf for any r <r. 

(ii) This theorem yields an interesting link between the convergence behavior of G and the solvability 

of system of equation ([ 8 ]l. Part (b) is that, take any standard estimation method such as the MLE, 
which yields convergence rate under Hellinger distance for the mixture density under fairly 
general conditions, the convergence rate for G under Wf is Moreover, part (a) entails 

that is also a minimax lower bound for G under Wy or W\ distance. 
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(iii) The convergence behavior of G depends only on the number of extra mixing components k — ko 
assumed in the finite mixture model. The convergence rate deteriorates astonishingly fast as 
k — ko increases. For a practitioner this amounts to a sober caution against over-fitting the 
mixture model with many more Gaussian components than actually needed. 

(iv) As we have seen from part (b) of Theorem 14.11 under the general setting of k — ko, G is restricted 
to the set Ofc(0 X fl) to C>fc^co(0 X which places a constraint on the mixing probability mass. 
However, this restriction seems to be an artifact of our proof technique. In fact, it can be removed 
with extra hard work, at least for the case k — ko < 2 , as the following proposition demonstrates: 

Proposition 4.3. Let k — ko = for 2. For G ^ Ok{Q x Ll) and Wr{G, Go) sufficiently small, 

V{pg,PGo)>W;{G,Go). 


4.2 Mixture of Gamma distributions and the location extension 

6“ 

The Gamma family of univariate densities takes the form f{x\a, b) := exjp{—bx) for x > 0, 

and 0 otherwise, where a, b are positive shape and rate parameters, respectively. 

Proposition 4.4. The Gamma family of distributions is not identifiable in the first order. 

Proof The proof is immediate thanks to the following algebraic identity, which holds for any a,5>0: 

^ = ^f{x\a,b) -^f{x\a + l,b). (10) 

Now given fc = 2, 02 = oi — 1, 6 i = 62 - By choosing (3i = (32 = 0, 71 = 0, affii = 72 a 2 > 
Oi 2 bi = —7202 and ai = —a 2 / 0, then we can verify that 

^aif(xlai,bi)-h/3i^(xlai,bi) + 7i^(x|ai, 6*) = 0. 
i=l 

□ 

The Gamma family is still strongly identifiable in fhe firsl order if eifher shape or rafe paramefer is 
fixed. If is when bofh paramefers are allowed fo vary fhaf sfrong idenfifiablily is violafed. Thus, neifher 
Theorem 13. II nor Theorem 13.21 is applicable fo shape-rale Gamma mixlures. Comparing fhe algebraic 
idenfify Q for fhe Gaussian and (ITOl) for fhe Gamma reveals an inferesfing fealure for fhe lalfer. In 
parficular, fhe linear dependence of fhe collection of Gamma densify funclions and ifs derivatives are 
due fo cerfain specific combinafions of fhe Gamma paramefer values. This suggesls fhaf oufside of 
fhese value combinations fhe Gamma densities may well be identifiable in fhe firsl order and even fhe 
second order Indeed, Ihis observation leads fo fhe following resulls, which we shall slate in Iwo separale 
mixlure sellings. 

ko 

Fix fhe Irue mixing measure Go = ^ Pi^{a9 feP) ^ ^koi®) where ko >2 and 0 C 

i=l * * 

Theorem 4.2. (Exact-fitted Gamma mixtures) 

(a) (Generic cases) Assume that ||a° — |, 11 / {1, ff} for all f < i,j < ko, and o? > 1 

for all f < i < ko- Then for G € Skoi®) B^i(G, Gq) sufficiently small, we have 


V{pg,PGo)>Wi{G,Go). 
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(b) (Pathological cases) If there exist 1 < i, J < /cq such that 
for any r > 1, 


{|a° - a°l, |6° - 6°|| = {1,0}, then 


lim inf 

e^0Geffc(,(e) 


^V{pg,PGo)/W:{G,Go) : Wr{G,Go) < e| 


= 0 . 


Turning to the over-fitted Gamma mixture setting, as before let Go G 8kf^{Q), while G varies in a 
larger subset of Ok{Q) for some given A: > fco + 1. 

Theorem 4.3. (Over-fitted Gamma mixture) 

(a) (Generic cases) Assume that — afj\,\b^ — b^\'^ {1,0} , {2, 0}|/or a//1 < i,j < ko, 

and a? > 1/or all I < i < ko. For any cq > 0, define a subset of Ok{Q): 

f 

Ok,co(&) =} G = '^Pi6(^ai,bi) ■ k' <k and \ai - a°| 0 [1 - cq, 1 + cq] U [2 - co,2 + co]V (f, j) 

i=l 

Then, for G G Gfc,co(0) W 2 {G, Go) sufficiently small, we have 

ViPG,PGo)>Wi{G,Go). 


(b) (Necessity of restriction on G) Under the same assumptions on Gq, for any r > 1, 


hm inf V{pG,PGo)/WffiG, Go) : W/(G, Go) < e = 0. 

e^>OGeOfc(0) 


(c) (Pathological cases) If there exist 1 < i, j < fco ||a? — a°|,— 6 °|| g| {1, 0} , {2,0} 

then for any r > 1 and any cq > 0 , 


bin inf V{pg,pg,)/W:{G, Gq) : Wr{G, Go) < e = 0. 
e^OGeOfc,co(0) 


Part (a) of both theorems asserts that outside of a measure zero set of the true mixing measure Go, 
we can still consider Gamma mixture as if it is strongly identifiable: fhe sfrong bounds V > Wi and 
V > confinue fo hold. In fhese so-called generic cases, if we fake any sfandard esfimafion mefhod 
fhaf yields convergence rale under Hellinger/varialional dislance for fhe mixlure densily pG, the 

corresponding convergence for G will be for exact-fitted and for over-fitted mixtures. 

The situation is not so forgiving for the so-called pathological cases in both settings: it is not 
possible to obtain the bound of the form V > for any r > 1. A consequence of this result is a 
minimax lower bound n under 11/ for the estimation of G, for any r > 1. This implies that, even 
for the exact-fitted mixture, the convergence of Gamma parameters a* and bi to the true values cannot 
be faster than n for any r > 1. In other words, the convergence of these parameters is mostly likely 
logarithmic. 
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Location extension. Before ending this subsection, we introduce a location extension of the Gamma 
family, for which the convergence behavior of its parameters is always slow. Actually, this is the 
location extension of the exponential distribution (which is a special case of Gamma by fixing the shape 
parameter a = 1). The location-exponential distribution {f{x\6, a), 6 E M, u E M+} is parameterized 


as f{x\e,a) 


-exp( 

a 


X — 9 
a 


■^{x>e} for ^11 a; E M. Direct calculation yields that 


de 


{x\e,a) 


—f{x\6, a) when x ^ 6. 


( 11 ) 


This algebraic identity is similar to that of location-scale multivariate Gaussian distribution, except for 
the non-constant coefficient 1/a. Since this identity holds in general, we would expect non-standard 
convergence behavior for G. This is indeed the case. We shall state a result for the exact-fitted setting 

ko 

only. Let 0 = M x R+, and Go = ^ E £ko(&) where ko > 2. 

i=l “ * 

Theorem 4.4. (Exact-fitted location-exponential mixtures) For any r > 1, 


hm inf I VipG,PGo)/W/iG, Go) : lLi(G, Go) < e| = 0. 
e^OGeffcofe) [ J 

Unlike Gamma mixtures, there is no generic/pathological dichotomy for mixtures of location- 
exponential distributions. The convergence behavior of the mixing measure G is always extremely 
slow: even in the exact-fitted setting, the minimax lower bound for G under Wi is no smaller than 
for any r. The convergence rate the model parameters is most likely logarithmic. 


4.3 Mixture of skew-Gaussian distributions 


The skew-normal density takes the form f{x\9, a, m) := —/ 

a 


X — 9 


a 


^{m{x — 9)/a), where f{x) = 




exp 


X 

T 


and <h(x) = J f{t)dt. m E R is the shape, 9 the location and a the scale param¬ 


eter. This generalizes the Gaussian family, which corresponds to fixing m = 0. In general, letting 
m / 0 makes fhe densify asymmefric (skew), wifh fhe skewness direcfion dicfafed by fhe sign of m. 
We will see fhaf fhis densify class enjoys an exfremely rich range of behaviors. 

We firsl focus on exacf-fiffed mixfures of skew-Gaussian disfribufions. Nofe fhaf: 


Proposition 4.5. The skew-Gaussian family {f(x\9,a, m), 0 E R, (t E R+, m E R} is not identifiable 
in the first order. 

An examination of fhe proof of Proposifion l4.5l reveals fhaf, like fhe Gamma family, fhere are cerfain 
combinations of fhe skew-Gaussian disfribufion’s paramefer values fhaf prevenf fhe skew-Gaussian fam¬ 
ily from salisfying sfrong idenfitiabilify conditions. Oufside of fhese “pafhological” combinations, fhe 
skew-Gaussian mixfures confinue fo enjoy sfrong convergence properties. Unlike fhe Gamma family, 
however, fhe pafhological cases have very rich sfrucfures, which resulf in a varied range of convergence 
behaviors we have seen in bofh Gamma and Gaussian mixtures. 

Throughout this section, { {f{x\9, a, m), {9, m) E 0, E U} is a class of skew-Gaussian density 

ko 

function where 0 C R^ and U C R+. Fix the true mixing measure Go = 2 m°)- Assume 
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that (T? are pairwise different and -^: 1 < j 7 ^ i < /col for all 1 < i < ko- For 

1 + (m^)^ I J 

eaeh 1 < j < ko, define the cousin set for j to be 






[cr. 


0\2 


1 + (m°) 


0',2 ’ 


9°) = ( 




1 + (m^)^ ’ ^ 



The eousin set eonsists of the indiees of skew-Gaussian eomponents that share the same loeation and a 
resealed version of the seale parameter. We further say that a non-empty eousin set Ij conformant if 
for any i G Ij, > 0. To delineate the strueture underlying parameter values of Go, we define a 

sequenee of inereasingly weaker eondifions. 

(51) 7 ^ 0 and Ij is empfy for alH = 1,..., ko. 

(52) There exisfs af leasf one sef I* fo be non-empfy. Moreover, for any 1 < i < /cq, if |/i| > 1, /i is 
eonformanf. 

(53) There exisfs af leasf one sef /j fo be non-empfy. Addifionally, fhere is k* G [1, /cq — 1] sueh fhaf 
for any non-empfy and non-eonformanf eousin sef we have |/j| < k*. 


We make several elarifying comments. 


(i) Condition (SI) corresponds to generic situations of true parameter values where the exact-fitted 
mixture of skew-Gaussians will be shown to enjoy behaviors akin to strong identifiability. They 
require that the true mixture corresponding to Go has no Gaussian components and no cousins 
for all skew-Gaussian components. 

(ii) Condition (S2) allows the presence of either Gaussian components and/or non-empty cousin sets, 
all of which have to be conformant. 

(iii) (S3) is introduced to address the presence of non-conformant cousin sets. 


Theorem 4.5. (Exact-fitted conformant skew-Gaussian mixtures) 

(a) (Generic cases) If (SI) is satisfied, then for any G G £ko{kI x 12) such that VFi(G, Go) is suffi¬ 
ciently small, there holds 

V{pg,PGo)>Wi{G,Go). 

(b) (Conformant cases) If (S2) is satisfied, then for any G G £ko(& x 12) and W 2 {G, Go) is suffi¬ 
ciently small, there holds 


V{PG,PGo)>Wi{G,Go). 

Moreover, this lower bound is sharp. 

When only condition (S3) holds, the convergence behavior of the exact-fitted skew-Gaussian mix¬ 
ture is linked to the (in)solvability of a system of polynomial equations. Specifically, define s fo be fhe 
minimum value of r > 1 such fhaf fhe following system of polynomial equations 

k*+l 

^ afifcf = 0 ( 12 ) 

i=l 
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does not admit any non-trivial solution. By non-trivial, we require that Oj > 0 for alH = 1,..., A:* +1, 
all fej / 0 and pairwise different, (oj, |5jl) ^ {aj, \ bj\) for all 1 < i 7 ^ j < /c* + 1 , and at least one of Cj 
differs from 0 , where the indices u, v in this system of polynomial equations satisfy l<v<r,u<v 
are all odd numbers when v is even or 0 < u < u are all even number when v is odd. For example, if 
r = 3, and k* = 1, the above system of polynomial equations is 

aici + 0202 = 0, 
aibicl + 02620 ^ = 0 , 
aicf + 0202 = 0, 
aiftfcf + a2b2C2 = 0. 

Similar to system of equations ([H) that arises in our theory for Gaussian mixtures, the exact value of s 
is hard to determine in general. The following proposition gives specific values for s. 

Proposition 4.6. (Values of s) 

(i) //r = 1, s = 3. 

(ii) Ifk* = 2, s = 5. 

The following theorem describes the role of s in the non-conformant case of skew-Gaussian mix¬ 
tures: 

Theorem 4.6. (Exact-fitted non-conformant skew-Gaussian mixtures) Suppose that {S3) holds. 

(a) Assume further that for any non-conformant cousin set Ii we have {p{-, |m?l) 7 ^ (pj, \m^\) for 
any j € Ij. Then, for any G G Sko{Q x ^tich that Ws{G, Gq) is sufficiently small, 

V{pG,PGo)>Wi{G,Go). 

(b) If the assumption of part (a) does not hold, then for any r > 1 , 

hm inf I V{pg,Pg,)/W{{G,Go) : Wi{G,Go) < e| = 0. 

We note that the lower bound established in part (a) may be not sharp. Nonetheless, it can be 
used to derive an upper bound on the convergence of G for any standard estimation method: an 
convergence rate for pG under the variational distance entails convergence rate for G under 

Wg. If the assumption of part (a) fails to hold, no polynomial rate (in terms of n“^) is possible as can 
be inferred from part (b). 

Over-fitted skew-Gaussian mixtures. Like what we have done with Gaussian mixtures, the analysis 
of over-fitted skew-Gaussian mixtures hinges upon the algebraic structure of the density function and 
its derivatives taken up to the second order. The fundamental identity for the skew-Gaussian density is 

f 3 f 77? ^ -h TO 3 f 

-^{x\B,a,m) - 2-^{x\e,a,m) H-— ■^{x\6,a,m) = 0. (13) 

The proof for this identity is in Lemma|T2] This implies that the skew-Gaussian class is without excep¬ 
tion not identifiable in fhe second order. By no excepfion, we mean fhaf fhere is no generic/pafhological 
dichofomy due fo cerfain combinafions of fhe paramefer values as we have seen in fhe firsl-order anal¬ 
ysis. Nofe fhaf if m = 0 fhis is reduced fo Eq. (17]l in fhe univariafe case. The presence of nonlinear 
coefficienl {m^ -I- m)la‘^, which depends on bofh m and a, makes fhe analysis of fhe skew-Gaussians 
much more complex fhan fhaf of fhe Gaussians. 

The following fheorem gives a bound of fhe fype V > Wf, under some condifions. 
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Theorem 4.7, (Over-fitted skew-Gaussian mixtures) Assume that the support points of Gq satisfy 
the condition (51). Let fe > fco + 1 and r > 1 to be defined as in ([8ll. For a fixed positive constant 
Co > 0, we define a subset ofOk{Q)-' 

{ k* 

G = € Oki& X n) : Pi > co^ I < i < k* < k 

i=i 

Then, for any G E x Ll) and WmiG, Gq) sufficiently small, there holds 

VipG,PGo)>W^{G,Go), 
where rn = rifris even, and m = r + 1 if r is odd. 



Remarks. 


(i) \f k — ko = 1, we can allow G £ Ok{& x D), and the above bound holds for m = 4. Moreover 
this bound is sharp. 


(ii) Our proof exploits assumption (S1), which entails the linear independent structure of high order 

f df 

derivatives of / with respect to only 0 and m, and the instrinsic dependence of ^ on 7 —^. 

d9^ oa^ 

Although we make use of Eq. (1131) in the proof we do not fully account for the dependence of 
0 ^/* d f 

on —— as well as the nonlinear coefficient (m^ + rrA/a'^. For these reasons the bound 
dm 

produced in this theorem may not be sharp in general. 

(iii) If k — ko = 2, it seems that the best lower bound for V{pg,PGo) is Wf{G,Go). (See the 
arguments following the proof of Theorem l4.7l in the Appendix). 


(iv) The analysis of lower bound of V(pg,PGo) when Gq satisfies eifher (S2) or (S3) is highly non- 
frivial since fhey confain complex dependence of high order derivatives of /. This is beyond the 
scope of this paper. 


5 Minimax lower bounds, MLE rates and illustrations 

5.1 Convergence of MLE and minimax lower bounds 

Given n-iid sample Xi,X 2 ,..., distributed according to mixture density pgq, where Gq is unknown 
true mixing distribution with exactly kg support points, and class of densities {f{x\9, S), 0 E 0, S E 0} 
is assumed known. Given k £N such that k > ko + 1. The support of Gq is 0 x Q. In this section we 
shall assume that 0 is a compact subset of and f) = | S E 5j^''' : A < < y/ Arf(S) < a| , 

where 0 < A, A are known and di > 1, fi 2 > 0. The maximum likelihood estimator for Gq in the over¬ 
fitted mixture setting is given by 

n 

Gn= argmax ^log(pG(^i))- 
GGOfcfexQ) 

For the exact-fitted mixture setting, Ok is replaced by Ek ^. 
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According to the standard asymptotic theory for the MLE (cf..e.g.. lvan de Geen 119961] ). under the 
boundedness assumptions given above, along with a sufficient regularity condition on the smoothness of 
density /, one can show that the MLE for the mixture density yields (log n/n)^/^ rate under Hellinger 
distance. That is, h{pQ ,PGo) = Op((logn/n)^/^), where Op denotes in -probability bound. 
It is relatively simple to verify that this bound is applicable to all density classes considered in this 
paper. As a consequence, whenever an identifiability bound of the form V > holds, we obtain that 
Wr{Gn,Go) < (logn/n)^/2^ in probability. 

Eurthermore, if we can also show that h > > W[ is the best bound possible in a precise sense 

- for instance, in the sense given by part (c) of Theorem l3.2l (for r = 2) or par t (a) of Theorem 14. II (for 
r = r), then an immediate consequence, by invoking Le Cam’s method (cf.l^ 1 1997l] ). is the following 
minimax lower bound: 

infsupVEi(G„,Go) 

Gn Go 


where r' is any constant r' E [I,?”), the supremum is taken over the given set of possible values for 
Go, and the infimum is taken over all possible estimators. Combining with an upper bound of the form 
(logn/n)^/^’’ guaranteed by the MLE method, we conclude that is the optimal estimation rate, 

up to a logarithmic term, under Wr distance for the mixing measure. 

Eor mixtures of Gamma, location-exponential and skew-Gaussian distributions, we have seen patho¬ 
logical settings where V cannot be lower bounded by a multiple of ETl for any r > 1. This entails that 
the minimax estimation rate cannot be faster than n for any r > 1. It follows that the minimax rate 
for estimating Go in such settings cannot be faster than a logarithmic rate. 

In summary, we obtain a number of convergence rates and minimax lower bounds for the mixing 
measure under many density classes. They are collected in Table [T] 


5.2 Illustrations 

Eor the remainder of this section we shall illustrate via simulations the rich spectrum of convergence 
behaviors of the mixing measure in a number of settings. This is reflected by the identifiability bound 
V > WJi and its sharpness for varying values of r, as well as the convergence rate of the MLE. 


Strong identifiability bounds. We illustrate the bound V > Wi for exact-fitted mixtures, and V > 
W 2 for over-fitted mixtures of the class of Student’s t-distributions. See Eigure[T] The upper bounds 
of V and h were also proved earlier in Section |2l Eor details, we choose 0 = [—10,10]^ and = 
|s E 5^"'' : \/2 < y'^A 7(^ < y/Ad(S) ^ The true mixing probability measure Go has exactly 

ko = 2 support points with locations 9^ = (—2,2), = (“4,4), covariances S)* = 


/9/4 1/5 \ 

Vl/5 13/6;’ 


SO = 


5/2 2/5 


2/5 7 / 3 /’ ~ 1/3)P 2 = 2/3. 5000 random samples of discrete mixing measures 

G E ^2) 5000 samples of G E G 3 were generated to construct these plots. 


Weak identifiability bounds. We experiment with two interesting classes of densities: Gaussian and 
skew-Gaussian densities. According to our theory, sharp bounds of the form V > W/ continue to hold, 
but with varying values of r depending on the specific mixture setting, r can also vary dramatically 
within the same density class. 

The results for mixtures of location-covariance Gaussian distributions is given in Eigure |2] Sim¬ 
ulation details are as follows. The true mixing measure Go has exactly A:o = 2 support points with 
locations 0° = —2, 0° = 4, scales di = 1, cr^ = 2, and p^ = 1/3,P 2 = 2/3. 5000 random samples of 
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Figure 1: Mixture of Student’s t-distributions. Left: Exact-fitted setting. Right: Over-fitted setting. 


discrete mixing measures G € £ 2 , 5000 samples of G G O 3 and another 5000 for G G O 4 , where the 
support points are uniformly generated in 0 = [—10,10] and = [0.5,5]. 

The bounds for skew-Gaussian mixtures are illustrated by Figure[3] Here are the simulation details. 
The true parameters for mixing measure Go will be divided into three cases. 


• Generic case: 

Pl=P2=P3 = 1/3- 


(-2, 1,1), (0^,771^, fJ^) 


(4,2,2), ml aO) = (-5,-3,3), 


• Conformant case: (0°, m-?, o’?) = (—2,0,1), (0l ml cl 
Pi=P2=P3 = 1/3- 


(4, Vs, 2), (0l ml 



(4, Vs, 3), 


• Non-conformant case: (fflmlal = (—2,O,l),(0lmlcrl = (4,-\/3, 2), (@ 3 , m 3 , cig) = 
(4,-V8,3),p?=pO=pO = l/3. 


As before, 5000 random samples of discrete mixing measures G £ £ 2 , 5000 samples of G G O 3 and 
another 5000 for G G O 4 , where the support points are uniformly generated in 0 = [—10,10] and 
n = [0.5,5]. 

It can be observed that both lower bounds and upper bounds match exactly our theory developed in 
the previous two sections. 



Figure 2: Location-scale Gaussian mixtures. From left to right: (1) Exact-fitted setting; (2) Over-fitted by one component; 

(3) Over-fitted by two components. 


Convergence rates of MLE. First, we generate n-iid samples from a mixture of location-scale 
multivariate Gaussian distributions which has exactly three components. The true parameters for 
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Figure 3: Skew-Gaussian mixtures. From left to right: (1) Exact-fitted generic case; (2) Exact-fitted conformant case; (3) 
Exact-fitted non-conformant case; (4) Over-fitted by one component. 



Figure 4: MLE rates for location-scale mixtures of Gaussians. L to R: (1) Exact-fitted: Wi n (2) Over-fitted by 

one: Wi ~ (3) Over-fitted by two: We ~ 


the mixing measure Go are: 6^ = (0,3),6l^ = (l,-4),6l§ = (5,2), S? = (^^'^324 0 ^ 8 ^ 751 ))’ 

Eg = ^ ^i'^25 1^75^) ’ ~ (^0 4 ) ’ ~ ~ ~ likelihood 

estimators are obtained by the EM algorithm as we assume that the data come from a mixture of k 
Gaussians where k > ko = 3. See Figure IH where the Wasserstein distance metrics are plotted against 
varying sample size n. The error bards are obtained by running the experiment 7 times for each n. 

These simulations are in complete agreement with the established convergence theory and confirm 
that the convergence slows down rapidly as k — ko increases. 

We turn to mixtures of Gamma distributions. There are two cases 

• Generic case: We generate n-iid samples from Gamma mixture model that has exactly two 

mixing components. The true parameters for the mixing measure Gq are: = 8 , = 2, 

ftO = 3, 6^ = 4, TfO = 1/3, TfO = 2/3. 

• Pathological case: We carry out the same procedure as that of generic case with the only differ¬ 
ence is about the true parameters of Gq- In fact, we choose a? = 8 , a® = 7, b\ = 3, = 3, 

TT? = 1/3, TT^ = 2/3. 

It is remarkable to see the wild swing in behaviors within this same class. See Figure |5] Even 
for exact-fitted finite mixtures of Gamma, one can achieve very fast convergence rate of in 

the generic case, or sink into a logarithmic rate if the true mixing measure Gq takes on one of the 
pathological values. 
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Figure 5: MLE rates for shape-rate mixtures of Gamma distributions. L to R: (1) Generic/Exact-fitted: 
TFi(G„,Go) ~ (2) Generic/Over-fitted: W 2 ~ (3) Pathological/Exact-fitted: Wi « l/(log(n)^/^. (4) 

Pathological/Over-fitted: Wi « l/(log(n)^^^. 


6 Proofs of representative theorems 

There are two types of theorems proved in this paper. The first type are sharp inequalities of the form 
V{PG,PGo) ^ W^{G,Go) for some precise order r > 0 depending on the specific setting of the 
mixture models. The second type of results are characterization theorems presented in Section [T2] 

In this section we present the proofs for three representative theorems: Theorem 13. II for strongly 
identifiable mixtures in the exact-fitted setting, Theorem 13.21 for strongly identifiable mixtures in the 
over-fitted setting, and Theorem |4T] for over-fitted Gaussian mixtures (i.e., a weakly identifiable class) 
as well as Proposition l4.2I These proofs carry important insights underlying the theory — they are orga¬ 
nized in a sequence of steps to help the reader. For other density classes (e.g., second order identifiable. 
Gamma and skew-Gaussian classes) the proofs are similar in spirit to these two, but they are of interest 
in their own right due to special and rich structures of each density class. Due to space constraints the 
proofs for these and all other theorems are deferred to the Appendix. 

6.1 Strong identifiability in exact-fitted mixtures 
PROOF OF THEOREM I3.il It suffices to show that 

,Go)|lFi(G,Go)<e| >0, (14) 

where the infimum is taken over all G G (0 x D). 


lim inf 


V{pg,PGo/Wi{G 


ko 

Step 1. Suppose that (fT4l) does not hold, which implies that we have sequence of Gn = 2 ^ 

i=l 

^fco(0 ^ converging to Gq in Wi distance such that V{pGn,PGo)/^i{Gn,Go) —0 as n —)• 00. 
As Wi{Gn', Gq) —)• 0, the support points of Gn must converge to that of Gq. By permutation of the 
labels i, it suffices to assume that for each z = 1,... , /cq, , Sf) ^ (0°, S°). For each pair {Gn, Go), 
let {(?” } denote the corresponding probabilities of the optimal coupling for {Gn, Gq) pair, so we can 
write: 

wi{Gn,Go)= ^ qri(ii^r-^°ii + iisr-sO|i). 

l<2J<fc0 

Since Gn and Gq have the same number of support points, it is an easy observation that for suffi¬ 
ciently large n, = min(p”,p?). And so, \P? ~ Pi\- Adopting the notations that 
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A0” ;= — of, AS” := E” — T,f, and Apf := pf — pf for all 1 < i < ko, we have 

fco 

W,{Gn, Go) = qum - en + l|sr - S°||) + - ^°ll + l|S? - S°ll) 

i=l i^j 

ko 

< J2pn\\M?\\ + ||AE”||) + \Apf\ =: d{Gn,Go). 

i=l 

The inequality in the above display is due to qf- < pf, and the observation that ||0” — 0°||, ||S” — S?|| 
are bounded for all 1 < i,3 < h for suffieiently large n. Thus, we have V(pg„ , Pgo ) /d{Gn , Gq) —)■ 0. 


Step 2, Now, eonsider the following important identity: 

ko ko 

PgPx)-pgo{x) = X^Ap”/(x|0°,S°) +X^p”(/(x|0r,sr) -/(x|0°,S°)). 
i=l i=l 

For eaeh x, applying Taylor expansion to funetion / to the first order to obtain 


ko 


ko 


y;p?(/(i|»",S”) - f(x\0l E«) = 


2=1 


2=1 


(Mff^ixlOf, j:f) + tr ( ^ix\ef, E^)-AE 


df. 


do 


df 


as 


qO \^ 0 ^T A 

-‘i 


+Rn{x), 


where Rn{x) = O ^'rdl^^”11^"'''^^ + IIAS”|d+'^2)^ ^ where the appearanee of ( 5 i and 82 are 

due the assumed Lipsehitz eonditions, and the big-0 eonstant does not depend on x. It is elear that 

supj, \Rn{x)/d{Gn-, Go| —)• 0 as n —)■ 00 . 


ko 


{AefY%{x\ef,Tff)+tv ( ^(x|0O,SO)^AS 


do 


Denote An{x) = pf 
2=1 
k 

Bn{x) = Y1 Ap”/(x|0?, S?). Then, we ean rewrite 
2 = 1 


df ^ 


as 


]” 1 I and 


{PG„ix) - PGoix))/d{Gn,Go) = (Anix) + Bnix) + Rnix)) / d{Gn, Gq) . 


Step 3. We see that An{x)/d{Gn, Gq) and Bn{x)/d{Gn, Gq) are the linear eombination of the sealar 
df df 

elements of f{x\6, S), —{x\0, S) and 'Q^ix\G, S) such that the coefficients do not depend on x. We 
shall argue that not all such coefficients in the linear combination converge to 0 as n —)■ 00 . Indeed, if 
the opposite is true, then the summation of the absolute values of these coefficients must also tend to 0: 

I ^ \Apf\ +pr(l|A 0 rili + ||AS”||i)|mg„,G) ^ O. 

i=l ' 

Since the entry wise ii and £2 norms are equivalent, the above entails | I Ap”| + p”(||A0”|| + 

II AS” II) I /d{Gn, Go) —)• 0 , which contradicts with the definition of d{Gn, Go). Asa consequence, we 

can find at least one coefficient of the elements of An{x)/d{Gn, Go) or Bn{x)/d{Gn,Go) that does 
not vanish as n —)■ 00 . 
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step 4. Let uTifi be the mciximum of the cibsolute Vcilue of the sciiliir coefficieiits of 
Bn{x)/d{Gn,Go) and dn = l/run, then dn is uniformly bounded from above for all n. Thus, as n —>• 


ko 


^df 


^,dnAn{x)/d{Gn,Go) ^ E K ^ (^^1 (x|0,^ 7 ^ ) and Gq) 


i=l 


df. 






ko 

E Oiifix\6i,'^i), such that not all scalar elements of a*, /3i and 7 * vanish. Moreover, 7 ^ are symmetric 

i=l 

matrices because S” are symmetric matrices for all n, i. Note that 


dnV{pGr,,PGo)/d{Gn,Go) = j dn\pGnix) - PGoix)\/d{Gn, Gq) 

dn\Anix) + Bn{x) + Rn{x)\/d{Gn, Gq) dx —)• 0. 

By Patou’s lemma, the integrand in the above display vanishes for almost all x. Thus, 


/' 


ko 


^ aif(xl0°, S°) + /3f S°) + tr ( %t{x\0^i, ) = 0 for almost all x. 


i=l 


de 


5/. 




By the first-order identifiability criteria of /, we have = 0, /3i = 0 G and 7 * = 0 G ]R'^ 2 xd 2 foj- 
alH = 1,2, which is a contradiction. Hence, (ITdl) is proved. 


6.2 Strong identifiability in over-fitted mixtures 

PROOF OF THEOREM 13.21 (a) We only need to establish that 

liin inf jsup |pG(a^)-PGo(a;)|/hP 2 ^(G,Go) : W 2 (G,Go) < el > 0. (15) 

e^OGeOkiS) j 

The conclusion of the theorem follows from an application of Patou’s lemma in the same manner as 
Step 4 in the proof of Theorem 13. II 

Step 1. Suppose that (fTSl ) does not hold, then we can find a sequence G Oh{&) tending to Gq in 
VP 2 distance and sup \pGn{x) — PGo{x)\/W 2 {Gn, Gq) —)• 0 as n —>• 00 . Since k is finite, there is some 

x&X 

k* G [kQ, k] such that there exists a subsequence of G„ having exactly k* support points. We cannot 
have k* = kQ, due to Theorem 13.11 and the fact that fP 2 ^(Gn,Go) < fPi(Gn,Go) for all n. Thus, 
kQ + l <k* <k. 

k* ko 

Write Gn = Pfd(^e",-£V') and Gq = E Since W 2 {Gn,GQ) —)• 0 , there exists a 

i=l i=l * * 

subsequence of Gn such that each support point of Gq is the limit of a subset of Sj > 1 

support points of Gn- There may also a subset of support points of Gn whose limits are not among the 
support points of Gq — we assume there are m > 0 such limit points. To avoid notational cluttering, 
we replace the subsequence of Gn by the whole sequence {G^}. By re-labeling the support points, Gn 
can be expressed by 

ko-\-m Si ko-\-m 

2=1 j = l 2=1 

where ( 0 ”-, ST) —>• ( 0 °, S?) for each i = 1,... ,ko + m, j = 1 ,..., Sj, = 0 for i < ko, and we 
have that p” := EjLi P^j P? for all i- Moreover, the constraint /cq + 1 < E^=l™ Si < k must hold. 
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We note that if matrix S is (strietly) positive definite whose maximum eigenvalue is bounded (from 

above) by eonstant M, then S is also bounded under the entrywise ^2 norm. However if S is only 

positive semidefinite, it can be singular and its I 2 norm potentially unbounded. In our context, for 

i > /cq + 1 it is possible that the limiting matrices S? can be singular. It comes from the fact that the 

some eigenvalues of S” can go to 0 as n —)• 00 , which implies det(S” ) —)■ 0 and hence det(S^) = 0 . 

By re-labeling the support points, we may assume without loss of generality that ..., 

are (strictly) positive definite matrices and , • • •, singular and positive semidefinite 

matrices for some mi G [0, m]. For those singular matrices, we shall make use of the assumption that 

lim f{x\6, S) = 0 : accordingly, for each x, f{x\0^,, S” ) —)■ 0 as n —)■ 00 for all feo -h mi -|- 1 < 
Ai(S)->-0 

i < ko + m,l < j < Si. 


Step 2. Using shorthand notations A0”- := 0”- — 0?, AS”- := S”- — S? for f = 1,..., feo -h mi and 
y = 1,..., Sj, it is simple to see that 

ko-\-mi Si ko+m 

W^{Gn,Go)<d{Gn,Go)-.= Y. J]p”(||A0”||2 + ||AS”f)+ \p7.-P% (16) 

2=1 j=l 2=1 

because lUKCnjCo) is the optimal transport cost with respect to £ 2 , while d{Gn,Go) corresponds 
to a multiple of the cost of a possibly non-optimal transport plan, which is achieved by coupling the 
atoms {6fj, S^-) for j = 1 ,..., Si with (0?, S?) by mass min(p”,p?), while the remaining masses 
are coupled arbitrarily. Since sup \pGn{^) — PGo{x)\/W 2 {Gn,Go) vanishes in the limit, so does 

sup \pgAx) -PGo(.x)\/d{Gn,Go). 

For each x, we make use of the key identity: 

fco+rrti Si ko+mi 

2=1 j = l 2=1 

fco+m Si 

+ E Ej’S/w«S'^S) 

2=/co+mi+l j=l 

An{x) + Bn{x) + Cn{x). (17) 

Step 3. By means of Taylor expansion up to the second order: 

fco+mi Si ko+mi 

2=1 j=l 2=1 Q: 


PgAx)-PGo{x) = 
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where a = (cti, 02 ) such that ai + 02 £ {1, 2}. Specifically, 


i=i 


^pr,tr (|^(x|0O,sOfAS- ), 

i=i ^ 




AlM.n) = 2£(A0i 


sO)A 0 f^, 

i=i 

5 EpB ‘r (f (^(^l«?. S”)^AE” ) ) AE’ 


n\T 

ij) 


i=i 


In addition, i2„(x) = o(^E£|"'E,'L iPri(l|A0'^f+^ + ||ASfj2+<5; 
Lipschitz condition. It is clear that sup^, \ Rn{x)\/d{Gn,Go) 0 as n ^ 00 . 


due to the second-order 


Step 4. Write Dn '■= d{Gn, Go) for short. Note that (pcni^) ~ PGoi^))/Dn is a linear combination 
of the scalar elements of /(x| 0 , S) and its derivatives taken with respect to 0 and S up to the second 
order, and evaluated at the distinct pairs (0?, S?) for f = 1,..., fco -|- mi. (To be specific, the elements 

of f{x\e, S), ^{x\e, S), ^{x\e, S), ^{x\e, S), 0 (x| 0 , S), ^{x\9, S), and ^(x\0, S)). 
In addition, the coefficients associated with these elements do not depend on x. As in the proof of 
Theorem 13.11 we shall argue that not all such coefficients vanish as n —)■ 00 . Indeed, if this is not true, 
then by taking the summation of all the absolute value of the coefficients associated with the elements 

f f 

of -t^{x\6) as 1 < f < di and for 1 < tt, u < d 2 , we obtain 


so? 


ko+mi Si 

E EfBdi^o; 

i=l j=l 


""2 + ||ASi,-|| 2 )/Z), 


^3 


ko-\-m 

Therefore, E \p 7 ~ P^\/Dn —>• 1 as n —)■ 00 . It implies that we should have at least one coefficient 

i=l 

associated with a f{x\6) (appearing in Bn{x)/Dn) does not converge to 0 as re 00 , which is a 
contradiction. As a consequence, not all the coefficients vanish to 0. 


Step 5. Let be the maximum of the absolute value of the aforementioned coefficients, and set 
dn = l/mn- Then, dn is uniformly bounded above when re is sufficiently large. Therefore, as re — 00 , 


32 











we obtain 


dnBnix') / Dji 

fco+mi 

dn ^ Al^{elTP)/Dr, 

i=l 

kQ-\-mi 

i=l 

ko+mi 

dn Y ^")/Dn 

i=l 

ko+mi 

dn Y ^0,2(^°,S0)/Z)„ 

i=l 

ko+mi 

dn Y AlA0^,^°)/Dn 

i=l 

where a* E M, Pi, vn,... ,Visi E 


fco+mi 

a,/(x| 0 O,sO), 

i=l 

fco+mi p.p 
2=1 

fc p+m i /Of \ 

i=l ^ ^ 

fco+mi Si „2 ^ 

i=l j=l 
ko+mi Si 

E E‘^ 


i=l j=l 

ko+mi Si 

E E -5 

1=1 j=i 


A 




Vij 




5 


for all 1 < 


'yi,rjii,..., rjis^ are symmetric matrices in 

ko+m Si 

i <kQ + mi,l < j < Si. Additionally, dnCn{x)lDn = D-^ ^ £ dnP^f{x\e'pj, S"-) 0 

i=fco+mi + li=l 

due to the fact that /(x|0^-, E^) —>• 0 for all A:o +mi +1 < i < feo + m, 1 < j < Sj. As a consequence, 
we obtain for all x that 

kQ-\-Tn,\ ^ p\ p ^‘2 p 

Y {«J(x|0°,sO) + /3f^(x|0,^sO) + £i.5^(x|0,^sOK + 

2=1 ^ 7 = 1 


^ / j=i 


d 


df 


^{ti{—ix\9lJ:^ifvij 


9S 




+ 


= 0 . 


From the second-order identifiability of {f{x\6,T,),9 E 0,S E fl}, we obtain ai = 0, Pi = un = 
... = Vis^ = 0 E = rjii = ... = r|is^ = 0 E M'^ 2 xd 2 all 1 < i < /cg + 1 ^ 1 , which is a 

contradiction to the fact that not all coefficients go to 0 as n —)■ 00 . This concludes the proof of Eq. (ITSl) 

and that of the theorem. 

ko 

(b) Recall Gq = Fi^(09,s9)- Construct a sequence of probability measures Gn having exactly 

2=1 * * 

^ 0+1 \ X 

/cg + l support points as follows: Gn= Yl Pi'dtgn where Of = 0 ^ -1^^, @2 = + = 

i—l \ z ^ I ! fj YJ 


S5- Id2 and S 2 = S5 -I—1^2■ Here, 1^,2 denotes identity matrix in M‘^ 2 xd 2 ^ vector with 

all elements being equal to 1. In addition, [OYn'^^+i) = for all i = 2,...,ko. Also, 

Pi = P2 — and pYi = P^ for alH = 2,..., k^. It is simple to verify that En '■= VFf (Gn, Gg) = 

{PiY 


sn _ 0O|| + ll^n _ 0O|| + ll^n _ ^ _ 5.0||)r- = 


n' 


n' 
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By means of Taylor’s expansion up to the first order, we get that as n —oo 


y{PGu,PGo) 


f/ 

^ Jx^X 


j2 E + 

i=l 01,02 


f |i?i(x)| dx, 
x&X 


dx 


where ai E , 02 E in the sum such that | ai | +1 02 1 = 1, -Ri is Taylor expansion’s remainder. 

2 

The second equality in the above equation is due to ^ = Q for each ai, 02 such 

i=l 

that I ail + |a 2 | = 1. Since / is second-order differentiable with respect to 9, S, Ri(x) takes the form 

i=l |o|=2 -(f 


where a = ( 01 , 02 ). Note that, = 0(n ^). Additionally, from the hypothesis, 

i=l 

d^f 


sup 

tg[0,l] 


x&X 


(x|0)' + fA0^„S? + fAS^J 


(96'"i5E“2 


dx < 00 . It follows that J |Ri(x)| dx = 0{n 


- 2 \ 


So for any r < 2, V{pg^,Pgo) — o(Wi{Gn, Go))- This concludes the proof, 
(c) Continuing with the same sequence Gn constructed in part (b), we have 


h'^iPGr, 


,PGo) - 


2 p; / 

x&X 


iPGnjx) -pGoix)f 


dx < 


Rf(x) 


xeX 




dx. 


where the first inequality is due to \/pgJ^ + VPGoix) > s/PGoix) > \/Pif{x\9i, S'j’) and the 
second inequality is because of Taylor expansion taken to the first order. The proof proceeds in the 
same manner as that of part (b). 


6.3 Proofs for over-fitted Gaussian mixtures 

Proof of Theorem 14.11 For the ease of exposition, we consider the setting of univariate location-scale 
Gaussian distributions, i.e., both 6 and E = cr^ are scalars. The proof for general d > 1 is pretty similar 

ko 

and can be found in Appendix II. Let v = so we write Gq = Pi^(d9 v9)- 

i=l ’ * 

Step 1. For any sequence G„ E Ok^coi® x fl) —)• Gq in Wr, by employing the same subsequenc¬ 
ing argument in the second paragraph in the proof of Theorem 13.21 we can represent without loss of 
generality 

ko Si 

Gn = E E -L" ) ’ 1 

i=l j=l 

where [p^j^O^pV^j] (p°, d?, v^) for alH = 1 ,..., /cq and j = 1,..., Si, where si,..., Sko are some 
natural constants less than k. All Gn have exactly the same '^Si <k number of support points. 
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step 2. For any x E M, 


ko Si ko 

PGnix) -PGoix) = ~ + Y1 

i=l j=l 2=1 

where p” := ^ p”, and p° = 0 for any i > ko + 1. For any r > 1, integer N > r and a: E M, by 
i=i 

means of Taylor expansion up to order N, we obtain 


ko Si N 

PG.W-PC.W = xiEpS E 

i=l j=l |q,| = i 


^ ^ i?i(a:)(19) 




a\ 


ko 

Here, a = ( 01 , 02 ), |o| = oi + 02 , o! = oi!o 2 !. Additionally, Ai{x) = ^ (p" -p°)/(x|0°,u°), 

2=1 

/cq 

and Ri{x) = 0 (E E F?i(l+ \Avfj\^+^). 

i=ij=i 


d^f df 

step 3. Enter the key identity (|7]l (cf. Lemma lTTT] ): -;^{x\0, v) = 2—{x\9, v) for all x. This entails, 

d0‘^ ov 

Qni+n2 t 2 Qni+2n2 j 

for any natural orders m, n 2 , that = 2 n 2 gni+ 2 n 2 converting all 

derivatives to those taken with respect to only 6, we may rewrite (fT9l ) as 


2 " 2 ni!n 2 ! 5(9^ 


PG„w-PG,w = EEK,E E zJ.y UiMolv^) 

i=l j = l a>l 221,212 

+ Ai(x) + i 2 i(a:) 


:= Ai(x) + Hi(x) + i 2 i(x), 
where ni,n 2 in the sum satisfy ni + 2n2 = o, ni + 77-2 < N. 


( 20 ) 


Step 4. We proceed to proving part (a) of the theorem. From the definition of r, by setting r = r — 1, 
there exist non-trivial solutions (c*, a|, for the system of equations ([Hi. Construct a sequence 

of probability measures Gn E 0^(0 x H) under the representation given by Eq. (ITH as follows: 


on _ qO \ 

— “1 H ) 
n 


Aj 


= u? 


2b* 

_ I rA = 

' 2 ’ ^ 1 ? 

j 


Pl{c*f 


k—ko-\-l 

E I 

i=i 


for all y = 1,..., /c — A:o + 1, 




and = 09^ vfi = v^, p^i = p^ for alH = 2,..., ko- (That is, we set si = /c — feg + 1, = 1 for all 

2 < i < ko). Note that b* may be negative, but we are guaranteed that > 0 for sufficiently large n. 

k-ko+i /I a* I 2 | 6 *|\ 1 

It is easy to verify that FFi (Gn, Go) = E Fo —~ “I-^ ^ because at least one of the a* 

• 1 ^ V T) 71 /^ } 71 
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step 5. Select N = r in Eq. (l20l) . By our construction of Gn, clearly Ai(x) = 0. Moreover, 


Bi{x) = 


k—kQ-\-l r—1 

E pmEE 

2=1 a=l 711,712 

fc—fco+1 2r 

+ E phEE 

2=1 a=r 711,712 

r—1 


2^2ni\n2\ ^ 

2 »^ 2 ni!n 2 ! 89°^ ' 


y~! -San ^na ^l) + ^ naa 


cx=l OL>r 

In the above display, for each a >r, observe that Can = 0{n~°‘). Moreover, for each 1 < a < r — 1, 

1 


Ban — 


fc —fco + 1 

„a g (c*)2 i=l 
i=l 


k ko + l __ 

E E =0, 


ni-\-2n2=(y. 


ni\n2l 


because (c*, o*, form a non-trivial solution to system ([Hi. 


Step 6. We arrive at an upper bound the Hellinger distance of mixture densities. 

1 f iPGAx)-PGoix))^ 


h^{PG^,PGo) ^ i / 

^Pl JM. 


2r 


f{x\e[,v^) 

gaf 


&X 


< 

rs_/ 


+Ri{x) 


f{x\6\,v\) 


Ax, 


/gaf \2 

For Gaussian densities, it can be verified that I ] /f{x\6\,Vi) is integrable for all 1 < 

\ c/6'“ J 

a < 2r. So, h?{pGn^PGo) ^ 0(n“^^) + / R\{x)/f {x\9\,Vi) dx. Turning to the Taylor remainder 
Ri{x), note that 


k-ko+l I 1 'i 

ifliWi;£ E E 

i=l \0\=r+l 


lA-r/^ / (1 - *)'' + *Ae”i.»? -i- 


df. 


Now, (Au”j )^2 X n = o{n 2’’). In addition, as n is sufficiently large, we have for all 

\/3\ = r + 1 that 


tsTi / ^1 + /f{x\9^,vl)dx <oo. 

It follows that h{pG„,PGo) = 0{n~^). As noted above, VEi(Gn, Go) >; n~^, so the claim of part (a) 
is established. 
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step 7. Turning to part (b) of Theorem 14.1[ it suffices to show that 


lim inf [s\vp\pG{x) - PGo{x)\/W^{G,Gq)-.W riG^Go) < A > {). ( 21 ) 

e^0GGC>fc,^(,(e) lx£X } 

Then one can arrive at theorem’s claim by passing through a standard argument using Patou’s lemma 
(cf. Step 4 in the proof of Theorem 13. 111 . Suppose that (|2TI) does not hold. Then we can find a 
sequence of probability measures Gn G O^^coi^ ^ that are represented by Eq. ([T 8 ] ). such that 
W^{Gn, Go) 0 and sup^. \pG„ix) - PGoix)\/W^{Gn, Go) 0. Define 

fco ^0 

D„ := d(o„,G„) := y; +iA.,.’*r)+yy |p’* - p»|. 

i=l j=l i=l 

Since W^{Gn,G) < Dn, for all x G M {pG„ix) — pgq{x))/D n —)• 0. Combining this fact with (l20l) . 
where N = f,'we obtain 


{Ai[x) + i?i(x) + Ri{x)) / Dji —>■ 0. 
We have Ri{x)/Dn = o(l) as n —>• (X). 


( 22 ) 


QOLf 

Step 8 . Ai {x)/Dn and Bi{x)/Dn are the linear combination of elements of —— (x| 0 ,x) where 

d9°‘ 

a = ni + 2n2 and ni + n 2 < r. Note that the natural order a ranges in [0, 2r]. Let Ea{9,v) 

Qa j 

denote the corresponding coefficient of -——{x\9,v). Extracting from (l20l) . for a = 0, Eo{9^,v^) = 

o9°‘ 

{Pi- -Pi)/Dn- For a > 1, 


Ea{9lv^) = 


Si 


EpS E 


(Afl“.)">(Ap5)"" 


j = l 


ni+ 2 n 2 =Q: 

ni+n 2 <r 


2”2 77, 177 , 2 ! 


/Dn. 


Suppose that Ea{9/, x°) —)• 0 for alH = 1,..., /sq and 0 < a < 2r as n —)> 00 . By taking the 

fco 

summation of all \Eo{9^, x°)l, we get ^ \p2 — p^\/Dn —)• 0 as n — 00 . As a consequence, we get 

2=1 


ko Si 


J2T.P^M^?/+\^v?jn/Dn ^ 1 as 
i=l j=l 


n — 00 . 


Hence, we can find an index i* G {1, 2,... , ko} such that ^ + \Avf*^)\^)/Dn 7 ^ 0 as 


i=i 


* 




n ^ 00 . Without loss of generality, we assume that i* = 1. Accordingly, 

Pij A. 

DnE^{9\,a\) 


E^{9lvl) := 


£ Pi E 

j=l ni+ 2 n 2 =a 
niH-n 2 <r 


2 ^ 277 ^ 1772 ! 


SI 


EK,(|A0-r+|Ar;-)r 

i=i 


SI 


0 . 


Ep?,(iA0-r+iAx-)r) 

f=i 


If Si = 1 then Ei{9\,iy\) and ^ 2 ^( 0 ?,!/?) yield \A9^l^/{\A9^^Y + lAuf^l^, lAxfilVdA^fil’' + 
IAX 77 I'’) —)• 0 — a contradiction. As a consequence, si > 2. 
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Denote = ^max = max {| |,..., |A 6 »f^J|, | ...,0. 

Since 0 < v'lJVn ^ 1 for 1 < J < sij by a subsequence argument, there exist cf := lim p^Jpn 
for all j = 1,..., si- Similarly, define a, := lim A.9?jMn, and 2bj := lim Au? /M^ for each 

j = 1,..., si- By the constraints of Ofc,co> Pij A cq, so all of cj differ from 0 and at least one of them 
equals to 1. Likewise, at least one element of {aj, equal to -1 or 1. Now, for each a = 1,..., r, 

divide both the numerator and denominator of Fa{6Y'>Ji) by p^ and then and let n ^ oo, we 
obtain the following system of polynomial equations 


Si 


E E 

j=l ni+2n2=o: 


niln2l 


= 0 for each a = 1 ,... 


r. 


Since si > 2, we get r > 4. If Oj = 0 for all 1 < z < si then by choosing a = 4, we obtain 

Si 

Y2 = 0- However, it demonstrates that 6 ^ = 0 for all 1 < i < si — a contradiction to the fact that 

^ J J 

j=i 

at least one element of (oj, is different from 0. Therefore, at least one element of (ai)iii 'rot 
equal to 0. Observe that Si < k — ko + l(because the number of distinct atoms of is ^ k 

and all Sj > 1). Thus, the existence of non-trivial solutions for the system of equations given in the 
above display entails the existence of non-trivial solutions for system of equations ([H). This contradicts 
with the definition of r. Therefore, our hypothesis that all coefficients £^{0^ vanish does not hold 
— there must be at least one which does not converge to 0 as n —)• oo. 


Step 9. Let nin to the maximum of the absolute values of Ea{dY^i ) where 0 < a < 2r, 1 < 
i < ko and dn = l/m„. As 0 as n —>• oo, is uniformly bounded above for all n. As 

dn\Ea{6Y 'U°)| < 1 , we have dnEa{9Y r'?) Yi,a for all 0 < a < 2 r, 1 < z < fco where at least one 
of Yia differs from 0. Incorporating these limits to Eq. (l2^ . we obtain that for all x G M, 

/cq 2iE 

(PgAx) PGo{x))/Dn-^ EEft.S(-i«.“.“h=o. 

i=\ a =0 

By direct calculation, we can rewrite the above equation as 

E exp(-L^) =0 forall xeR, 

where for odd j are linear combinations of /3q2Zi)a for (i — l )/2 < h < f, such that all of the 
coefficients are functions of differing from 0. For even j, are linear combinations of /3j(2Z2+i)’ f®'" 
il2 < I 2 <r, such that all of the coefficients are functions of differing from 0. Employing the same 
argument as that of part (a) of Theorem l3.4[ we obtain yij = 0 for all z = 1,..., /cq, J = 1, • • •, 2r + 1. 
This entails that Pia = 0 for all z = 1,... , fco, a = 0,..., 2r — a contradiction. Thus we achieve the 
conclusion of (l 2 T]) . 


PROOF OF PROPOSITION m Our proof is based on Groebner bases method for determining 
solutions for a system of polynomial equations, (i) For the case k — ko = 1, the system dUl when r = 4 
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can be written as 


(23) 

(24) 


c^ai + 0202 = 0 

+ clal) + cfti + c |62 = 0 

— (c^of + c|a 2 ) + cfaibi + 0202^2 = 0 (25) 

— (c^of + c|a 2 ) + ^(cfa^bi + c|a 262 ) + 3" '^^ 2 ) — 0 (^ 6 ) 

Suppose that the above system has non-trivial solution. If cioi = 0, then equation (|2^ implies 0202 = 
0. Since ci, C 2 / 0, we have oi = 02 = 0. This violates the constraint that one of oi, 02 is non-zero. 
Hence, cioi, C 2 O 2 / 0. Divide both sides of (|2^.(|24]).(|25]).(|261) by c^oi, c\a\, cfof, c\a\ respectively, 
we obtain the following system of polynomial equations 

1 + a = 0 

1 + + 2(6 + x^c) = 0 

1 + x^a?' + 6(6 + x^ac) = 0 

1 + x^a^ + 12(6 + x^a^c) + 12 ( 6 ^ + x^(?) = 0 


where x = C 2 /C 1 , o = 02 / 01,6 = 61 /ai, c = 62 /ai. By taking the lexicographical order a >- b c >- 
X, the Groebner basis of the above system contains x® + 2x^ + 2x^ + 1 > 0 for all x G M. Therefore, 
the above system of polynomial equations does not have real solutions. As a consequence, the original 
system of polynomial equations does not have non-trivial solution, which means that r < 4. However, 
we have already shown that as r = 3, Eq.® has non-trivial solution. Therefore, r = 4. 

(ii) The case k — ko = 2. System ® when r = 6 takes the form: 


^ c?Oi = 0 


i=l 




1=1 

3 


2=1 


g E + 2 ^ ^ ^ 

2=1 2=1 

2 I E ^ E ^ ^ E ^ 

2=1 2=1 2 = 1 

^ E ^ E ? E ° 


120 


2=1 

3 


2=1 
3 


2=1 
3 


^ E ^ E + i E ^ E 0 


2=1 


2=1 


2=1 


2=1 


(27) 

(28) 

(29) 

(30) 

(31) 

(32) 


Non-trivial solution constraints require that ci, 02,03 / 0 and without loss of generality, ai / 0. 
Dividing both sides of of the six equations above by ofai, o^a^, ofaf, ofaf, o^af, cfaf, respectively, we 
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obtain 


1 + + y‘^b = 0 

^(1 + + 7 /^ 6 ^) + c + x^d + y^e = 0 

\ 

-(1 + x^a^ + y'^b^) + c + x'^ad + y^be = 0 

3 

— (1 + x^o^ + y^ 6 ^) + c + x^a^d + y'^b^e + (? + x'^d^ + y^e^ = 0 

^(1 + x^a^ + y'^b^) + \{c + x'^a^d + y'^h^e) + + x^ad^ + y%e^ = 0 

DlJ O 

^^(1 + x^a® + y^6®) + + x'^a^d + y'^b^e) + + x^a^d + y^6^e) + = 0 

where x = C 2 /C 1 , y = cs/ci, a = a^jax^ b = 03 / 01 , c = 61 /of, d = b 2 ld\,e = bzla\. By taking the 

lexicographical order a >- b >- c >- d x y, we can verify that the Groebner bases of the above 

system of polynomial equations contains a polynomial in terms of x^, y^ with all of the coefficients 
positive numbers, which cannot be 0 when x, y G M. Therefore, the original system of polynomial 
equations does not have a non-trivial solution. It follows that r < 6 . 

When r = 5, we retain the first five equations in the system described in the above display. By 
choosing x = y = 1, under lexicographical order a >~ b >- c d e, we can verify that the Groebner 
bases contains a polynomial of e with roots e = ±\/2/3 or e = (—3 ± \/2)/6 while a, b, c, d can 
be uniquely determined by e. Thus, system of polynomial equations ([ 8 ]) has a non-trivial solution. It 
follows that r = 6 . 

(iii) For the case k — > 3, we choose ci = C 2 = ... = Ck-ko+i = 1, Oi = 6 j = 0 for all 

4<i<k — ko + 1. Additionally, take 01 = 02 = 1. Now, by choosing r = 6 in system (H), we 
can check by Groebner bases that this system of polynomial equations has a non-trivial solution. As a 
result, r >7. 
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APPENDIX I 


In this appendix, we give proofs of the following results: Theorem 13.41 regarding the characteriza¬ 
tion of strong identifiability in mixture models with matrix-variate parameters, Theorem [33] regarding 
preservability of strong identifiability under transformation. Theorem 14.21 for exact-fitted Gamma mix¬ 
tures, Theorem 14.41 for exact-fitted location-exponential mixtures. Theorem 14.51 for exact-fitted skew- 
Gaussian mixtures. Theorem 14.71 for over-fitted skew-Gaussian mixtures. Proofs of most propositions 
and some corollaries are also given. Proofs of Theorem 13.31 Theorem 14.31 and Theorem 14.61 are quite 
similar to the ones that we have already mentioned above, and are deferred to Appendix II. 


7 Proofs of other main results 


fco + 1 


PROOF OF PROPOSITION iMl We choose Gn = ^ C>k(& x such that 


2=1 


SF) = (00, SO) for f = 1,..., ko, = 0?, S^^+i = S? + 


/cq+I ^ 2/?' 

Additionally, Pi = Pi — exp(—n),pf = p^ for all 2 < f < ko, and p^^^i = exp(—n). With this 

construction, we can check that wf{G,Go) = d^^'^/y/n. Now, as h?{pGn^PGo) ^ y{PG„,PGo)’ we 
have 


, 1 
-Idn where a = —. 


exp 


Wr^{Gn,Go) 


h^iPG,PGo) ^ exp -n + 


2y/n 


|/(x|0O,S^„+i)-/(x|0)',S)’)|dx, 


0 


d. 


,/3/2 


XdX 


which converges to 0 as n —)• oo. The conclusion of our proposition is proved. 


PROOF OF COROLLARY 1331 By Theorem 13.11 there are positive constants e = e(Go) and 
C'o = C'o(Go) such that V(jpg,Pgo) — GoWi{G,Go) when Wi{G,Go) < e. It remains to show 
that 

inf yiPG^PGn)/y^i(G,Go) > 0. Assume the contrary, then we can find a sequence of 
Geq:W'i(G,Go)>€ 

Gn & G and Wi{Gn, Gg) > e such that —)■ 0 as n ^ oo. Since ^ is a compact set, 

we can find G' £ G and Wi{G',Go) > e such that —)• G' under Wi metric. It implies that 

Wi{Gn,Go) —)• Wi{G',Go) as n —>• oo. As G' ^ Go, we have lim Wi{Gn,Go) > 0 . As a con- 

n^oo 

sequence, y{pGnjPGo) —)• 0 as n —)• oo. From the hypothesis, y{pGn^PG') < G(0, (G^, G'), 

so y{pGr,^PG') 0 as iyi(Gn,G') 0. Thus, L(pggPGo) = 0 or equivalently pgq = Pg' almost 
surely. From the first-order identifiability of family of density functions {/(x|0, S), 0 G 0, S G fl}, it 
implies that G' = Go, which is a contradiction. This completes the proof. 


7.1 Characterization of strong identifiability 

PROOF OF THEOREMiMl We present the proof for part (a). The proof for other parts are similar 
and left to Appendix II. Assume that for given A; > 1 and fc different tuples (0i, Si, mi),..., (0fc,Sfc,mfc), 
we can find Uj G M, fdj G M'^, symmetric matrices 7j G and pj G M, for j = 1, ..., A: such that: 


+ -f tr r^(x|0j, Sj, mj)'^ 7 j^ -f ^(x|0j, S^, m^) = 0, 

i=i ^ ^ 
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Substituting the first derivatives of / to get 


k 


E 


7 ?'log((x 


9j) + (x- 


- 0 ,) 

exp ( 


(x - 




0, (33) 


where 

2ajmjT{d/2) - mjr(d/2) tr(S"Sj) + 2r]jT{d/2) 

“ 27r'^/2r(d/(2mj))|Ej|V2 ’ 

2m]T{d/2) vn]T{d/2) 

a' — _ ] ^ > _y-l/9- V = _ 3^1' _and 

Pj vr'^/2r(d/(2mj))|S,f/2 7r'^/2r(d/(2mj))|E,-|^ ’ 

^ -mj7^jT{d/2) 

h vr'^/2r(d/(2mj))|Sj|V2' 


Without loss of generality, assume rrii < m 2 < • • • < m^. Let i E [1, k] be the maximum index 
such that mi = mj. As the tuples (0j, Sj, mi) are distinct, so are the pairs (0i, Si),..., {6j, Sj). In 
what follows, we represent x by x = xix' where xi is scalar and x' E Define 


= (x') I'ix', bi = [(/30 - 20 ^ 7 '] x', a = eJ-f'iOi - {Pi)^9i, 


di = (x')^Sr^x', ei = - 2 {xY^-^ 0 i, /i = 

Borrowing a technique from Yakowitz and SpraginsI 1 1968 1. since ( 6 * 1 , Si), ..., {9j, S^) are distinct, 
we have two possibilities: 


(i) If Sj are the same for all 1 < y < f, then 0i,..., are distinct. For any i < j, denote Ajj = 

9i — 9j. Note that if x' ^ |ri E : v/^Aij = o|, which is a finite union of hyperplanes, then 

{x')^9i ,..., {x')'^9j are distinct. Hence, if we choose x' E outside this finite union of hyperplanes, 
we have ((x')'^0i, (x')’^Six'),..., ((x')^0j, (x')'^S^x') are distinct. 

(ii) If Sj are not the same for all 1 < j < f, then we assume without loss of generality that 
Si,..., Sm are the only distinct matrices from Si,..., Sj, where m < i. Denote 6ij = Sj — Sj as 

'i- < i < j < rn, then as x' ^ |tt E : iiF5ijU = o|, we have (x')^Six',..., (x')^Smx' 

are distinct. Therefore, if x^ ^ |tt E : u^6ijU = o|, which is finite union of conics, 

((x')^0i, (x')^Six'),..., {{x')'^9m, (x')^Smx') are distinct. Additionally, for any 9j where m + 1 < 
j < i that shares the same Sj where 1 < f < m, using the argument in the first case, we can choose 
x' outside a finite hyperplane such that these {x')'^9j are again distinct. Hence, for x' outside a finite 
union of conics and hyperplanes, ((x')^0i, (x')^Six'),..., ((x')^d^, (x')^S^x') are all different. 

Combining these two cases, we can find a set D, which is a finite union of conics and hyperplanes, 
such that for x' ^ D, ((x')^0i, (x')^Six'),... ((x')^d-, (x')^Sjx') are distinct. Thus, {di,ei) are 
different as 1 < f < f. 
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Choose di^ = min_{dj}. Denote J = {l < i < i : di = diA. Choose 1 < 12 < i such that 

i<i<j 

6*2 = max {cj}. Multiply both sides of (l 3 ^ with exp —{di^xl + Si^xi + , we get 

iGJ 

+ ^*2^1 + Ci 2 )idi 2 xl + ejjXi + + 77'^ log(di2xf + 6*2X1 + 7*2) + 

^ < a' + (uj-Xi + bjXi + Cj){djx\ + e^xi + + 77 ' log(djXi + ejX + fj) > x 

exp [(^*23^1 + 6*2X1 + /*2)™'’2 - {djxj + e^xi + /j)™'-’] = 0 . ( 34 ) 

Note that if j G J\{ 72 }, dj = di^, nij = rrii^, and ej > 6 * 2 - So, 

{di^xl + 6*2X1 + /*2)™'*2 - (djxl + 6jXi + < -xi as xi is large enough. 

This implies that when xi —)> 00, 

Ai{x) = ^ ) “'• + (ojXi + 6jXi + Cj){djx‘l + 6jXi + + 77'. log{djx‘l + 6jX + fj) > x 

exp [{di^xf + 6*2X1 + /*2)™'2 - (djXi + 6jXi + 0. 

On the other hand, if j ^ J and 1 <j<l then dj > di^ and m*2 = rrij. So, 

(dijXi + 6*2X1 + /*2)""‘2 - [djx\ + 6jXi + fj)'^^ < as xi is large enough. 

This implies that when xi —)• 00, 

A2{x) = ^ |a' + (ojXi + bjXi + Cj){djxl + 6jXi + + 77' log(djXi + CjX + /j)| x 


l<j<i 


exp [{di^xf + 6*2X1 + /*2)”''2 - (djxj + 6jXi + 0. 


Or else, if j > i, then rrij > rrii^. So, {di^x\ + 6 * 3 X 1 + /* 2)™'*2 — {djx\ + 6 jXi + fj)"^^ < —xf^^. As 
a result, 

AsA) = ^|a' + {ujxf + 6 jXi + Cj){djx\ + 6 jXi + + r]j \og{djxl + CjX + /j)| x 

j>i 

exp [{di^xl + 6*2X1 + /*2)”'‘2 - [djx\ + 6jXi + /j)”*"] ^ 0. 

Now, by letting xi 00 , 

^ < a'■ + (ojXi + bjXi + Cj){djx\ + CjXi + + Vj'j \og{djx\ + ejX + fj) > x 

j^i2 ^ ^ 

exp [{di^xl + 6*2X1 + /*2)™'2 - [djxl + 6jXi + = Ai(x) + A2(x) + A^x) 0 .( 35 ) 

Combing (l34l) and (l35]) . we obtain that as xi ^ co 

6^*2 (®*2^i + ^*2^1 + c*2)(d*2Xi + 6*2X1 + 7*2)”^*^ ^ + r/*2 log((i*2Xi + 6*2X1 + 7*2) 0- 

The only possibility for this result to happen is 0*3 = 6*2 = 77 '^ = 0. Or, equivalently, {x')'^Ai 2 ^' = 
mf - 2C7*'2] x' = 0. If 7^2 7 ^ 0, we can choose the element x' ^ D lying outside the hyperplane 
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|m E = O}. It means that [x'Y '/ 0, which is a contradiction. Therefore, 7', = 0. It 

implies that {f3'i^)'^x' = 0. If 7 ^ 0, we can choose x' ^ D such that x' / 0. Hence, = 0. 
With these results, = 0. Overall, we obtain = I3[^ = 7 '^ = ri[^ = 0. Repeating the same 
argument to the remained parameters a'j, Pj, , rjj and we get a' = /3'- = 7 '- = 7 ' = 0 for I < j < k. 

It is also equivalent that aj = fij = 7 ^ = r]j = 0 for all 1 < j < k. 

This concludes the proof of part (a) of our theorem. 

PROOF OF THEOREMI331 The proof is a straightforward application of the chain rule. 


“If” direction: Let k > 1 and let A^), ( 72 )^ 2 ) ■ ■ ■ > iVk’^k) £ ©* x be A: different pairs. 
Suppose there are Oj E R, /3i E R'^L and symmetric matrices 7 * E R‘^ 2^'^2 ^och that 


^ai 7 (x| 7 *,A*) + /3f|^(x|7*,A*) + tr (^{x\r]*,A*f-f^^ = 0 
i=l ' '' ^ 


= 0 for almost all x. 


(36) 


Let (6»i,Si) := T{r]*,A*) for i = 1,..., A;. Since T is bijective, (^i, Si), { 82 , S 2 ),..., {Ok, T,k) 
are distinct. By the chain rule. 


di 


^9 f \ AN df 


/ = 1 
6^1 


l<U,V<d2 


UTfi 




1=1 


^[Tl{rJ,A)]^ 

drii 


+ E 

l<U,V<d2 




as 


dr]i 


and similarly, 
dg 


dA 




1 = 1 ■' l<U,V<d2 ■' 


where 7 = ( 71 ,..., 7 ^^ and S = [S^] where 1 < < ^ 2 - Equation (1^ can be rewritten accord¬ 

ingly as follows 


aif{x\ei, Sj) + iPlf^ixlOi, Si) + tr f = 0 

i=l ^ '' 


= 0 for almost all x. 


(37) 


where /?' = ((/3')\ • • •, (/3')"0. t' = [ 7 ']“". V^ = {{gi)\ • • •, {v^)H /3t = (/?/,..., ), 

7 i = [ 7 *]“’^, and for all 1 < j < di 


(ft)'-Eft + E 


h=l 


7 


l<U,V<d2 


a(Ai)“^ 


and for all I < j, I < d 2 


^ ^ l<U,V<d2 


h=l 


7r 


dlT2(g*,A*)j., 

a(Ai)“^ 


Given that {f(x\0,I!),0 E 0,S E 0} is identifiable in the first order, Eq. (l37l) entails that Oj = 
0,/3' = 0 E R'^L and 7' = 0 E Erom the definition of modified Jacobian matrix J, the 
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equations /?' = 0 and 7' = 0 are equivalent to system of equations J{r]*,A*)Ti = 0 , where tJ = 

Since 1 7 ( 7 *, A*) | / 0, the above 

system of equations has unique solution r* = 0 for all 1 < z < /c. These results imply that /?* = 0 E 
and 7.4 = 0 E xhus, g is also identifiable in the first order. 


“Only IF’ direction. Assume by contrary that the modified Jacobian matrix J(r/,A) is not non¬ 
singular for all (r/, A) E 0* x fl*. Then, we can find (ryo,Ao) E 0* x fl* such that J(ryo,Ao) is 
singular matrix. Choose k = 1 and assume that we can find ai E M, / 3 i E and symmetric matrix 
7i E M“*2xd2 such that: 


dT ^9, 


ai5f(x|r?o, Ao) + —(x|r/o, Aq) + tr 


dg 

dA 


(x|r?o, Ao)^7i 


= 0 for almost all x. 


The first-order identifiability of class {g{x\g, A),g E 0*, A E fl*} implies that ai = 0, /3i = 0 E 
and 7 i = 0 E M'^ 2 xd 2 possibility for the above equation to hold. However, by the same 

argument as in the first part of the proof, we may rewrite the above equation as 


aif{x\0o, So) -h {l3'i)^^{x\eo, Sq) -h tr 


9S 


(x|6»o,So)^7l 


0 for almost all x, 


where T(r/o, Aq) = {9o, Sq), and /3(, 7 J have the same formula as given above. The first-order iden¬ 
tifiability of {f{x\9, S), 0 E 0, S E 0} implies that = 0 E and 7 J = 0 E M‘^2xrf2 jhe last 
equation leads to the system of equations 7 ( 70 , Ao)r = 0, where 


r 


T 





7f, 


,7^ 


(^2 1 

.7i' , 



However, the non-singularity of matrix J{gQ,Ao) leads to non-uniquesness of the solution r of this 
system of equations. This contradicts with the uniqueness of the solution ai = 0, / 3 i = 0 E and 
7i = 0 E R'^ 2 xd 2 xj^g proof is complete. 


7.2 Over-fitted location-covariance Gaussian mixtures 

Lemma 7.1. Let |/(x| 0 ,S ),0 E E •S'j’'’'} be a class of multivariate Gaussian distribution, 

d'if gf 

Then, -q^{x\9, S) = till 9 and S E 

Proof. Direct calculation yields 


From these results, we can easily check the conclusion of our lemma. 


{x - 9 fi:-^{x - 9)\ 

2 )' 

□ 


PROOF OF PROPOSITION m We only consider the case k — ko = 1 (the proof for the case 
A: — /co = 2 is rather similar, and deferred to Appendix II). As in the proof of Theorem 14. 1 1 it suffices 
to show for 7 = 1 that 

sup \pg{x) - pgo{x)\/WI{G,Go) : iy4(G,Go) < el > 0. 

x^X j 


lim inf 
€->■0 G € Ok ( e } 


47 


(38) 








Denote v = cj^. Assume that the above result does not hold, i.e we can find a sequence of Gn = 

ko+m Si 

E E Go in W4 where (p”-, 6»”-, u"-) ^ (p°, 6»°, uf) for all 1 < i < feo + m, 1 < j < 

i=l j=l 

Si and = 0 as feo + 1 < f < feo + m. As A: — fco = we have m < 1 . Repeating the same arguments 

ko+m Si 

as the proof of Theorem | 47 T] up to Step 8, and noting that ^ Y 1 /d{Gn, Go) — 5 - 0 , we can 

1=1 j=i 

find z* E { 1 , 2 ,..., fco + m} such fhaf as long as 1 < a < 4 


+iA<*,n 




i=i 


* j 




E^r*, E 


(A 0 ; 






Epr*,iA0ii,|4 


i=i 


ni,n 2 


ni!n 2 ! 


0 ,( 39 ) 


i=i 




i=i 


where ni + 2n2 = a and 1 < a < 4 . As z* E { 1 , 2 ,..., /cq + m}, we have z* E ko} or 

i* E {ko + 1 ,... ,ko + m}. Firsfly, we assume thaf i* E { 1 ,..., /cq}- Wifhouf loss of generalify, lef 
i* = 1 . Since si < A: — /sq + 1 = 2 , fhere are fwo possibilifies. 


Case 1. If Si = 1 , fhen F[{ 9 \, v^) = A 0 ”^/| 0 , which is a confradicfion. 


Case 2. If si = 2 , wifhouf loss of generalify, we assume fhaf p”^|A 0 ”il < P12IA012I for infinifely 
many n, which we can assume fo hold for all n (by choosing fhe subsequence). Since p”^(A 0 ”]^)‘^ + 
Pi2{A9i2)^ > 0 , we obfain 0^2 / 0 for all n. If A 9 ii = 0 for infinitely many n, fhen F[{ 9 i,Vi) = 
A0”2/(A0i 2)^ 7^ which is a confradicfion. Therefore, we may assume 0 ”]^ 7^ 0 for all n. Lef 
a := lim p?, A 0 ])i/p?2A0f2 £ [“ 1 , !]■ Dividing bofh fhe numerator and denominator of F(( 0 ?,z; 9 ) 

n—^■oo 

by pY^2A0i 2 and telling n —00, we obfain o = — 1 . Consider fhe following scenarios regarding 
PiiIPu- 


(i) If Pii/Pi2 —> 00, fhen A0”^/A0f2 —)• 0. Since A0f^,A05*2 / 0, denote Az;”^ = A:”(A 0 f]^)^, 
Av'^2 = ^2 (A0i 2)^ f®'" Now, by dividing fhe numerator and denominator of ^2(05*, u^), F^{ 9 \, v^), 
F{{ 9 i, Vi) by p”2(A0i2)^, Pi2(A0i 2)^, and p”2(A0i2)‘^ respectively, we obfain 


MnA — 


Fin ,2 — 


Mn,3 — 


1 , , n , , nFll(A^ 

2+^2+ l^n^(A0-2)2 

i , p , 

3! 


1 , M , IMl' 

4 ! 2 2 


1 


+ 


un 

E + 

2 


0 , 

^ 0 , 


\ Ki(a^ 

) pU^9^2)^ 


0 . 


If |A:f|,|A;2l 

leasf one of |A:”|, |A:2 | does nol converge to 00. If |A:" 


00 fhen Mn,z > ^ for sufficienfly large n, which is a confradicfion. Therefore, al 

'• 00 and |A:2 I -/A 00 fhen Mn,i implies fhaf 




P?i(A0 


ID 


pU^oi 


-f+ 00. Therefore, |A:" 


Ki(A 0 ^i)^ 

F?2(A0f2)' 


0 as A0fi/A05^2 


(A0 


ID 




0 


^12)^ i^l2V^"12t ^'^12 

1 1 A;” 

as Pii/Pi2 oo- Combining Ihese resulls wifh Mn,3,Mn,4, we gel A:^ + ^ 0 and ^ + “^ + 


—I-)■ 0, which cannol happen. If |A:”| -Y 00, fhen Mn,i and Mn,2 implies fhaf A:^ + 1/2 —>• 0 and 

A:2 + 1/6 — 0 , which cannof happen eifher. As a consequence, pYIPi2 oo- 


48 














(ii) If PiiIpi2 0 then P12/P11 —^ 00■ Since p^iA6^^/pi2^Gi2 — 1 > we have |A0f^/A0f2l “S' 

00 or equivalently A0f2/^^ii 0 - From here, using the same argument as that above, we are also 

led to a contradiction. So, PxxlP\2 7^ 0 . 

(iii) If Pii/Pi 2 b ^ { 0 ,00}. It also means that AOii/A 6^2 — 1 /b. Therefore, by dividing 

the numerator and denominator of ^2(^1) -^3(6*°, Vi),F^{6^, v^) by p^2i^^i2)‘^-’ ^’sid 

P32(^^12)^ ^ 00, we arrive at the scaling system of equations ([8]l when r = 4 for which we 

already know that non-trivial solution does not exist. Therefore, the case si = 2 cannot happen. 

As a consequence, i* 0 { 1 ,..., fco}. However, since m < 1 , we have i* = ko + 1 . This implies 
that Sfcp+i = 1 , which we already know from Case 1 that ( 1 ^ cannot hold. This concludes the proof. 


7.3 Mixture of Gamma distributions 

PROOF OF THEOREM 1421 (a) For the range of generic parameter values of Go, we shall show 

that the first-order identifiability still holds for Gamma mixtures, so that the conclusion can be drawn 
immediately from Theorem 13 . II It suffices fo show fhaf for any aij G M (1 < f < 3,1 < j < ko) such 
thaf for almosf sure x > 0 


ko 


'^aiif{x\ai,bi) + a2i^{x\a°,bi) + Q!3j^(x|a°, 6°) = 0 


2=1 


( 40 ) 


fhen aij = 0 for all i,j. Equafion (l 40 l) is rewriffen as 


ko 

^ exp(-6°x) = 0, 


( 41 ) 


2=1 


U ^ , (('?r*(log(6°)-V^(a0)) , a0(60)“M ^ (6°)“° 

where Pu = au ^] q + a2i— -r—^57- - -h 0:3^ . q. —, /32i = = 


(b^Y 


'r(a«) 


r(a: 




r(a“ 


-a^i ' Fq, ■ Wifhouf loss of generality, we assume fhaf b\ < b^ < ■ ■ ■ < b\^. Denofe i fo be fhe 

^ (®i ) 

maximum index i such fhaf 6° = h\. Mulfiply bofh sides of (| 4 TI) with exp( 62 x) and let x +00, we 
obtain 


^ I3iix°-°i ^ + /32i log(x)x“i ^ + I3^ix°-°i 0. 
2=1 


Since |a° — a°| / 1 and > 1 for all 1 < i, j < i, the above result implies that fin = fi 2 i = /Ssi = 0 
for all 1 < i < z or equivalently an = a 2 i = 0 : 3 * for all 1 < z < z. Repeat the same argument for the 
remained indices, we obtain an = ot 2 i = a^i = 0 for all 1 < z < /cq. This concludes the proof. 

(b) Without loss of generality, we assume that {|a 2 “ ®il > |(^2 “ ('ll} = {1)0}. In particular, b\ = 

ko 

62 and assume — 1. We construct the following sequence of measures Gn = 2 

2 = 1 * ^ 


where a” = for all 1 < z < ko, 6 " = 
Pi = Pl + l/n,P2 = P 2 - ^/n,Pi = 


b^b^ = 6?(1 + ('r = for all 3 < f < ko, 

p^ for all 3 < z < ko- We can check that lTty(G, Go) x 
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1 /n + - l/n)| 6 ^ - X n ^ as n —oo. For any natural order r > 1, by applying Taylor’s 

expansion up to ([r] + l)th-order, we obtain: 


ko 


PGnix) -PGoix) = + (Pi -Pi)fix\atb\ 


2 = 1 


[r]+l 


{Pi - Pi)fix\a°i,bi) + {P 2 - P 2 )fix\a 2 ,b 2 ) + P 2 ^^^^-J^^ix\al,b^) + Rn{x). (42) 


i=i 


j\ dW 


The Taylor expansion remainder |i?n(x)| = 0 (p 2 l ^2 “ some <5 > 0 due to > 1- 

Therefore, Rn{x) = o{WJ^{Gn, Go)) as n —)• oo. For the choice of P 2 , b^, we can check that as j > 2, 
2*2 (^2 “ b^)^ = o{W^{Gn, Go))- Now, we can rewrite (1421) as 


PGn {x) - PGo (x) = AnX°-°^ exp(- 6 ?x) + BnX^° ^ exp{-b^x) + 

[r]+l 

J2p2 

3=2 


I n ,0 


i! 


dW 


{x\a 2 ^b 2 ) + Rn{x), 


(u0\af /V \a^ 

where we have An = n-. (Pi —Pi)— o\ -P 2(^2 “^i) = 0 similarly Bn = ] Q' (P 2 “F 2 ) + 


r(a?)^"^ r(a0, 

^ — 7 ~FT^— 2*2 (^2 “ ^ 1 ) = 0 Al n. Since aS > 1, 

r(a 2 ) 

- 2 \ 


dbi 


{x\a^,b^) 


r(a 0 ) 

is bounded for all 2 <J <r + l. 


It follows that sup^j^o \PGnix) — PGoix)\ = 0{n ^). Observe that 


V{PGn,PGo) = 2 


(PGoix) 


PG„ (x)<PGo ( 3 ;) 


PGnix)) d(x) < 2 


\pgAx) - PGo{x)\dx. 


xe(0,a2/fe5) 


As a consequence V{pGn,PGo) = 0{n so for any r > 1, V{pGn,PGo) = o(lT7(Gn, Go)) as 
n ^ 00 . 


PROOF OF THEOREM m (a) By the same argument as the beginning of the proof of Theorem 
13.21 it suffices to show that 


lirn inf | sup \pg{x) - pgo{x)\/W2{G, Go) : FF 2 (G, Go) < el > 0 . (43) 

e^OGeOfc,,o(e) [x(^X } 

Suppose this does not hold, by repeating the arguments of the aforementioned proof, there is a sequence 
Gn = Yj'fl P?jb{aY,bY) Go such that (a”-,5”) ^ {o!l,h^i) for a\\ I < i < k* where p° = 0 as 

i=l j=i 

ko + I < i < k*. Invoke the Taylor expansion up to second order, as we let n 00 , we have for 
almost surely x 


k* 

PG„{x) - PGoix) r/ I 0 >o^ , 3 i 0 > 0 n , 1 0 , 0 n , 

- cl{G Go) -^ 2^^-^(^iif{x\ai ,bi) + a 2 i-^ix\ai ,bi) + a3i-^{x\ai ,bi) + 

fl ^?) + 2 ^ Oi4ija5ij-^{x\alb^)\ = 0 , 

7 = 1 7 = 1 7 = 1 ■' 


(44) 
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Si Si Si 

where at least one of aij, a2i, a^i, X] ^ differs from 0 . We can rewrite the 

j=i j=i j=i 

above equation as 

r s 

log(x)x“ilog(x)^x“i"^ + log(x)a:“i = 0, 

i=l ^ ^ 

aP-l 

4 _ 
i=i 


where + /3, ^ J + 


a0(60)“?-i ^ aUa^-l){b^r° 

'X ^ <^5ij 


^-2 


f o d (jb^r^\ 4^ s =_^ mi, of .>2 

(^r(„0)J+2g«wa57,g^ /32* “3*+2 


9 /a0(60y 


0t^0^a0-l' 


J=1 

Si 


naT 

) 

r+2 E 




a0(60)«°-i 


,5i 4'^ 1 r(a0) 


5i 


+ 


- , ^ _ffi_+2f „2 s ((bfr’ 

’ *■ j5i ®«r(a,»)' *• '^•r(n») vr(a,») 

a'^( t)V j^i Si 

2 x; «4»i«5»i 02 —. /^5i = E and /^gi = -2 ^ «4iia5ij-^-^. Using the same ar- 

J=1 l®i J J=1 J _ j=l ^ V®i J 

gument as that of the proof of part (a) of Theorem 14.21 by multiplying both sides of the above equation 
with exp(62x) and let x +oo, we obtain 


i=i 

Si 

E 

7 = 1 


,0(50)“?-! 


/3hx“°-^ + /32iX“? + /33iX“°+l + /337 log(x)x“?-^ 

i=l 


+ f 34 i log(x)^x“i ^ + /Ssi log(x)x“^ ^ 0. 


By the constraints of Ok^co^ we have |a? — o^l ^ {1, 2} for all 1 < i,j < k*. Therefore, this limit 
yields Pu = f32i = f^si = Pa = /^st = 0 for all 1 < z < i or equivalently au = a 2 i = asi = auj = 
ct 5 ij = 0 for all 1 < i < i, 1 < j < Sj. The same argument yields an = a 2 i = a^i = a^j = a^ij = 0 
for all 1 < z < feo, 1 < j < Si, which leads to contradiction. This concludes the proof. 

fco+l 

(b) The proof is similar to part (b) of Theorem 14.21 We choose sequence Gn = Y Pi'b{a^,b^) hy 

2=1 ^ ^ 

letting af = a? for all 2 < z < /cq + 1> a" = + 1, b^ = 6?, b^ = b^{l + ^- —), bf = b^i 

for all 3 < z < ko + 1, Pi = Ijn, P 2 = Pi — 1/n, pf = p^_i for all 3 < z < fco + 1. Given this 
construction, we can check that as r > 1, {Gn, Go) = 1/n + (p^ — l/njjb^ — 65|’'. The remainder 
of the proof is proceeds in the same way as that of Theorem 14.21 

c) If there exists (i,j) such that ||a3 — a^|, \h^ “ ^jl| = then we can use the same way 

of construction as that of part (b). Now, the only case of interest is when we have some (z,y) such 
that |lf “ |6^ — 6^|| = {2,0}. Without loss of generality, assume that = a? — 2. We 


fco + l 


construct the sequence Gn = Y P 7 b{av-,b") as a” = a^, = O3 = a^, a” = a^i for all 4 < z < 

2=1 

hO 

ko + 1 , 63 = 63, b2 — b\ = h\ — bf = b^i for all 4 < z < fco + 1 , Pi = p? — c„. 


anZi 


„o 

P2 


n 


„o 

P2 


n 


P2 = ^ ^ (cn + - ] ,P3 = ^ ^ {cn- -], pf = P°_i for all 4 < z < feg + 1 . where 


(Qj^ —j— X 

Gi = -I ■ Now, we can check that for any r > 1 , 147 {Gn, Go) > c„ + — . As r > 2 , by 

(2n^ — 1)02 “ f ^ 
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means of Taylor expansions up to ([r] + l)-th order, we obtain 

3 

PgAx) - PGoix) = (pi - p?)/(x|a?, 6?) + C^Pi- p^)/(x|a^, b^) 


r+i 

+ y~! ^ 2 ) +(45) 

i=i 

where Rn{x) is the remainder term and therefore |i?„(x)|/lT7^(G„, Gq) —)• 0. We can check that as 
3 

i > 3, E p7{K - 6DV(4";(G'n,G'o) ^ 0 as n ^ 00 . Additionally, direct computation demonstrates 

i=2 

that 


3 2 EPii^i -b'^iV 

{Pi - Pi)f{x\a°i,b^i) + {^p 2 - P2)f{x\a^, 6°) + - -p-^ 2 ) = 0- 

i=2 j=l 


j\ dip 

The rest of the proof goes through in the same way as that of Theorem I4.2l part (b). 


ko 

PROOF OF THEOREM 14.41 Choose the sequence Gn = E Pfb^e^^a^) such that erf = uf for all 

2 = 1 

1 < i < ko, {pf, df) = (p?, 9^) for all 3 < i < ko- The parameters Pi,P 2 ,9^, 9^ are to be determined. 
With this construction of Gn, we obtain Wi{Gn, Gq) x \Pi—Pi\ + \P 2 ~P 2 \ ~^iI +^21^2 “^ 2 1- 

Now, for any x 0 ^'^4 for any r > 1, taking the Taylor expansion with respect to 9 up to 

([r] + l)-th order, we obtain 


PGAx)-PGoix) = - f{x\9^,a°)) + (p” - p°)/(x|6lf, a°) 


2 = 1 
2 




2 = 1 
2 

E 

2 = 1 


[r]+l 

E 

i=i 




jl 89^ 




+ R{x) 


[r^l g Qn\j 


fix\9^,(Ji) + R{x), 


where the last inequality is due to the identity (ITT]) and R{x) is remainder of Taylor expansion. Note 
that 


sup |i?(x)i/fF[(G„,Go) < j;o(|0r - ^ o. 

x^{ele°} 


2=1 


Now, we choose p" = p?+ 1 /n, P2 = P2 — 1 /n, which means p" +P2 = p? +P2 and p” — )• p? , P2 P2- 
As p^/j^-{cr^y are fixed positive constants for all 1 < j < r + 1. It is clear that there exists sequences 

[r]+l raO _ gn\j 

9 i and 6*2 such that for both i = 1 and i = 2, 0 ” — 9 ^ 0, the identity p? E —— = Pf ~ Pi 

holds for all n (sufficiently large). With these choices of p”,P2 , , 9 ^, we have 


sup |pG„(x)-pGo(x)|/iy{’(Gn,Go) = sup |i?(x)|/kF[(G„,Go)0. 

x^{e<l,eo} 
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To conclude the proof, note that there exists a positive constant nii such that mi > min {0i,02} and 
for sufficiently large n, 


VipGr.,PGo)/W^iGn,Go) < J \pGAx)-pGoix)\miGn,Go) ^ 0 . 

xG(min{6»J,0§},mi)\{6»J,6»^} 

7.4 Mixture of skew-Gaussian distributions 


Lemma 7.2. Let {/(x|0, cr, m), {0, m) G u G IR+} be a class of skew normal distribution. Then 
2 N t \a 2 I , rri^ + m df ^ 


,v^)+ ^ 

Proof. Direct calculation yields 
d^f 


{x\0, a^,m) = 0. 


902 


, i„ ^11 2 2{x - 0)‘^\ f m{x - 0) 

{x\0,a,m)=\[ - ^ ^ : 1 $ ' ^ ^ 




TTCJ"' 


a 


2m{m? + 2){x — 0) „ [mix — 0)W ( (x — 0)^ 

-/ - exp ' 


v/^( 


TXO^ 


a 


9/ if 1 ~ \ pr. 

(x|0,cT,m) =< (- ^ ^ ,— _ 1 d) 


9cj2 


y/^i 


vrcT"’ 


2ct2 
m{x — 0) 


a 


m{x — 0) ^ m{x — 0)'^]^ I' (x —0)^ 

(x-0)2 


y/^ra"^ 


a 


5/ / m ^ 2(x-0) ^fm{x-0) _ 

I exp 


2^2 


From these equations, we can easily verify the conclusion of our lemma. 


□ 


PROOF OF PROPOSITION I4.51 Forany/c > 1 and/c different pairs (0i, cji, mi),..., (0^, (Tfc, mfc), 
let aij G M for i = 1,..., 4, j = 1,... ,k such that for almost all x G M 


^ ^ 9/ 9/ 9/ 

^ aijf{x\0j,aj,mj) + a2jj^{x\0j,aj,mj) + a3jj^{x\0j,aj,mj) + a4j^(x|0j, m^) = 0. 


i=i 


We can rewrite the above equation as 


f=i 


2 i^ (mj{x-0j) 


X] 1 - 0j) + Psjix - 0j f]^ ( ) exp ( - | + 


(x - 0jf 


2a| 




where fiij = 


2aij oc^j 


/?27 = 


2a' 


2j 


kj = 


exp 


“3i 


2a2 


7ii = 


= 0, (46) 

2a2jmj 


yff/Kol 


and 72 j = 


V^af V^af 

a'ijmj 2aij 

- x_ H- - for all j = 1,..., k. Now, we identify two scenarios in which the first order 

v27rcT^ V27rc72 


3 ] 

identifiability of skew-normal distribution fails to hold. 
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Case 1: There exists some = 0 as 1 < j < k. In this case, we choose k = 1, mi = 0. Equation 
(|4^ can be rewritten as 


7ii 


hi , 

2 


+ 


hi , 721 
2 


(x - ^i) + ^{x - 9if = 0. 


By choosing asi = 0, an = 0, 021 =-the above equation always equal to 0. Since 021,041 

are not necessarily zero, first-order identifiability condition is violated. 


Case 2: There exists two indices I < i j < k such that 


O’! 


1 I 2 ’ 

1 + mf 




we choose k = 2, i = l,j = 2. Equation in (|4^ can be rewritten as 


1 + mf ^ 


,9j ■ Now, 


j=i 


- 6'i) + /33i(a: - 9jf]^ 


21^ / 


cr,- 


exp 


(x - 9jf 
2a] 


;^(E7« + Eto( 

\i=i 1=1 


X — 0i)^ 1 exp ( — 


(mf + l)(x - 9if 


+ 


= 0. 


Now, we choose aij = a 2 j = a^j = 0 for all 1 < j < 2, = 0 then the above equation 

CJi (T 2 

always hold. Since an and 042 need not be zero, first-order identifiability is again violated. 


PROOF OF THEOREMlH (a) According to the conclusion of Theorem l3.ll to get the conclusion 
of part a), it is sufficient to demonstrate that for any aij G M(1 < i < 4,1 < j < A:) such that for 
almost sure x G M 


ACq 


'^aijf{x\9' 




j=i 


■lm]) + a2,^ix\9] 


|0 .^0 


a],m]) + a3j-^{x\9],a],m]) + a^j — {x\9], 


(jA.rriA ) = u. 


then aij = 0 for all 1 < i < 4 and 1 < i < ko- In fact, using the result from Proposition (14.11 ). we can 
rewrite the above equation as 


El - 9]) + hj{x - 9]f]^ 

i=i 


(7ii +72i(a; - 9j))f 


where hi = 


“3i 


2q;i j 


y/^aj yf^a] 


, hi - 



exp 


{x-9jf' 

MY 


-f 


(x - 9 ]Y 
MY 


= 0 , 


(47) 


yfhna] 


Hi = 


2a2jmj 

and 72 j = 


y/^i 


TTu; 


a^jmj 2aij 

y/^a] 


y/^a. 


for all 1 < j < fco- Denote a^ — 


j+ko I _j_ (77i0)2 


yfhYah 

^ for all 1 < j < k(). From the 


assumption that cr? are pairwise different and 


1 -f {mfY 


0 {(7)'" ^ 


1 < j < ^0 r for all 1 < z < ko 


we achieve a] are pairwise different as 1 < y < 2ko. The equation (1471 ) can be rewritten as 


E 


1=1 


[hj + hj{x - 9]) + hj{x 






{x- 9 ]Y\\ 

M^ Ji 


= 0 , 


(48) 
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where = 0, = 9^, l3^j+ko) = ^^,/32(j+fco) = = 0 as /cq + 1 < j < 2A:o. 




Denote i = argmax {cr?}. Multiply both sides of (| 4 ^ with exp 

l<i<2fco 


(a; - 

2a^ 




'm2{x — 9^) 




and let x +oo if > 0 or let x —)• —oo if mj < 0 on both sides of new equation, we obtain 
~ + / 52 l(® “ implies that (3-^j = /32j = = 0 . Keep repeating the same 

argument to the remained ai until we obtain Pu = /32i = /^s* = 0 for all 1 < z < 2 /co. It is equivalent 
to an = a2i = asi = a^ = 0 for all 1 < z < /cq- This concludes the proof for part (a). 

(b) In this section, we denote v = a'^. Without loss of generality, we assume m?, m^, ■ ■ ■, rn2 = 0 
where 1 < ii < ko denotes the largest index z such that m? = 0 . Denote si = zi + 1 < S2 < • • • < 


sj G [zi + 1 , ko] such that ( 


«?) = (: 


VI 


9f) and > 0 for all Sj < j-,1 < 


*2 3 1 + (ttZ; ^ 

Sj+i — 1 , 1 < z < Z2 — 1 - From that definition, we have |/sj = Si+i — Si for all 1 < z < Z2 — 1 . In 
order to establish part (b) of Theorem 14.51 it suffices to show 


lim inf 


sup \pg{x) -pGo{x)\ 

x&X 

Wi{G,Go) 


: IF2(G,Go) < e V > 0 . 


( 49 ) 


Assume by contrary that (l 49 l) does not hold. It means that we can find a sequence Gn G Sk{Q x D) 
such that VF2(G'n,Go) —)■ 0 as n ^ oo and for all x G X, {pG^{x) — pGo{x))/W2{Gn,Go) —)• 0 


as n ^ oo. Denote G„ 


ko 

Pi^{e^,v”,mf) 


and assume that {pi 






0 

W,m, 


) for 


ko 

all 1 < z < ko- Denote d{Gn,Go) = + |Ap^| where A 0 ” = 

i=l 

0 ” — 0 °, An” = zzf — z;°, Am” = m” — m?, and Ap” = p^ — p^ for all 1 < z < n. According to the 
argument of the proof of Theorem l 3 .ll we have {pg„ (x) — PGo (x))/ d{Gn , Go) — 0 as rz —)• oo for all 
X G A”. By means of Taylor expansion up to second order, we can write [pg^ (x) — PGo {x) )/d{Gn , Go) 
as the summation of four parts, which we denote by A„^i(x), An^2{x)-, An^o{x), and A„^4(x). 
Regarding A„^4(x), it is the remainder of Taylor expansion, which means as rz ^ oo 


ko 


AnAx) = 0{^pn\A9:\^+^ + \Aa, 

i=l 


.n\2+5 


+ I Am”|^'’''^))/d(Gn, Go) —?■ 0, 


for some constant 5 > 0. 


df, 


Regarding A„^i(x), A„^ 2 ( 3 :), A„^3(x), these are linear combinations of/(x| 0 °, x?, m?), T^(a;| 0 f, x?, m°), 

o9 

|^(x|0°,x°,mO), ^(x|0O,x°,mO), ^{x\9^,v°,m°), |^(x|0O,x°,m°), ^(x|0O,xf,m^), 
d"^ f d'^f 

^^(x| 0 °,x°,m°), g^^^ (x| 0 °,x°,m°), (x| 0 °,x°,m°). However, in A„,i(x), the index z 

ranges from 1 to zi while in A„_2(a^) and the index z ranges from zi + 1 to — 1 and from 
to ko, respectively. 


Regarding A„^3(x), we denote Ba^a2ao {d^, to be the coefficient of 


Qaf 


for any < z < fco, 0 < a < 2 and ai + a2 + as = ct, aj > 0 for all 1 < y < 3 . 


0"ix"2rzz"3 


{x\0i, 




Regarding An Ax), the structure 


X° 7;0 

^ ,0?) = (' ^ 


1 + (m‘-)2’ -t 


1 + (mj*) 


0'i2 ’ 


0 °) for all Si < j,l < Sj+i — 1 , 
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1 < i < ^2 — 1 , allows us to rewrite An^ 2 {x) as 

*2 — 1 ( •Si+1-1 

An, 2 {x) = X] 1 ^ [“ij'i + “ 2 i*(a^ - ) + «3i*(a^ - + «4i*(3^ “ + «5i*(a^ - )1 X 

j = Si 


2=1 


x-O^.^ mPjx-6^X 






+ [/ 3 r* + / 32 *(^ - Ol) + Plix - 9lf + Pl{x - 00 ^ 3 ] X 

{m%f + 1 


exp 


2vl 


-ix-9l) 


0 ^2 


1 X^ 

where f(x) = _ exp(-). Moreover, d{Gn,Go)af ■■ is a linear eombination of elements of 

\/27r 2 ^ 

Ap”, (A 6 *”)“i (Anj)“2 for eaeh i = 1,... , i 2 —1> -s* < J < Sj+i — 1, 1 < /i < 5 and 1 < ai+Q ;2 < 2. 

•Si+l —1 

n'laa 

j ■ 


Additionally, d{Gn,GQ)f3f^^ is a linear eombination of elements of ^ (A0”)"i (An ”)“2 (Am 


3=Si 


for eaeh 1 < /2 < 4, 1 < i < i 2 — 1, and 1 < ai + a 2 + 03 < 2. The detailed formula of 
d{Gn, Go)a]'^j-, d{Gn, Go)/3(^i are given in Appendix II. 

Regarding A„p(x), the strueture rrii, m^,... ,m2 = 0 allow us to rewrite i(x) as 




AnAx) = y; [ 71 * + 72” (a: - e“) + l%{x - Sff + 7l2(x - sff + - e“)‘] / 

i=l 


x — 9j 


where d(Gn)G'o)7;j linear eombination of elements of Ap7^ (A6*7)“i (An7)“2(Amp"® for all 
1 < J < A and ai + 02 + «3 < 2. The detail formulae of d{Gn, Go)jIj are in Appendix II. 

Now, suppose that all 7 ” (1 < i < 5,1 < j < ii), /3^j (1 < i < 4, 1 < j < ^2 — 1), a” ; (1 < i < 5, 
Si < j < Sl+i - 1, 1 < / < i2 - 1), ilaia2a3(^*^'^*^F) (fo*" CKi + 02 + 03 < 2) gO tO 0 aS 
n —)■ oo. We can find at least one index 1 < i* < ko such that (|App,| + p'^,{\A9ft.\‘^ + |AnJlp + 
IAm”*p))/(i(Gn,Go) 7 ^ 0 as n —)■ oo. Define 6 *”*,njl,m”*) = |Ap”*| +p”*(|A 6 *”*p + |Ar;^P + 
I Am”* p). There are three possible cases for i*: 


Case 1: 1 < f* < ii. Since n”», m”*)/(i(Gn, Go) ■/> 0, we obtain that for all 1 < y < 5 

d{Gn,Go) 


G? := 


slii* 


d(p”*, 0 ”*,r;”*,m”*) 

Within this scenario our argument is organized into four steps. 


0 as n —)■ 00 . 


Step 1.1 : We can argue that A 0 tl ^ Anjl, Am”* / 0 for infinitely many n. The detailed argument is 
left to Appendix II. 


Step 1.2 : If IA0^ I is the maximum among | A^Ji |, | An)l |, | Am^. | for infinitely many n, then we can 
assume that it holds for all n. Denote An)l = /cf A0)l and Am”* = where A:”, k^ G [—1,1]. 

Assume that /c” —)■ ki and A:^ —)• A :2 as n —)• 00 . As G 5 —)■ 0, dividing both the numerator and 
denominator by (A 0 ^ )^, we obtain that as n —)• 00 


lApfil 

(A0”)2 


jk?? _ 

+ 1 + (Af )2 + (A^)2 


0 . 


(50) 
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If 


|Ap; 


(A0-)' 


oo as n —)• oo, then Cf + 7 ^ 0 as n 

|Ap- 


00 , which is a contradiction to the 


fact that Cf, —)• 0 as n —00. Therefore, 


(A 0 ; 


m \2 


7^ OO as n —)• OO. Combining this result with 


|l, we obtain fci = 0. Similarly, by dividing both the numerator and denominator of and C^, we 

1 2k2 1 ^2 

obtain the following equations — _ H-= 0 and _ H-= 0. These equations imply 

" vr \/ 27 ra 9 . vr 


2V^a% 

that 1 /cT?* = 0 , which is a contradiction. 


Step 1.3 : If I Auf, I is the maximum among | A 0 p» |, | Auf* |, | Amf* | for infinitely many n, then we can 

I I 

assume that it holds for all n. However, the formation of C? implies that ——00 as n —00. It 


{Av^j 


again leads to Cf + 74 0 as n —)• 00 , which is a contradiction. 


Step 1.4 : If I Am”* I is the maximum among |A 0 [l|, |Au^|, |Am”*| for infinitely many n, then we 
can assume that it holds for all n. Denote A 9 f, = k^Amf* and Avf* = Am”*. Let k^ k^ and 

k2 ki- With the same argument as the case | A0^ | is the maximum, we obtain k^ = 0. By dividing 
both the numerator and denominator of and CJ by (Amf*)^, we obtain the following equations 


fcs 1 2k‘i 

H— = 0 and-h 


kl 




vr 


vr 


2^f^G% 


= 0 , for which there is no real solution. 


In sum. Case 1 cannot happen. 


Case 2: s\<i*<sj — 1. Without loss of generality, we assume that si < f* < S 2 — 1. Denote 


S 2-1 




3=si 


Since d{p2*,0^,,v^,,m^,)/d{Gn,Go) 7 ^ 0, we have dnewip2*,di*,v]i,m2,)/d{Gn,Go) ^ 0 as n 
00 . Therefore, for 1 < j < 5 and si < z < S 2 — 1, 


J^n _ d{Gn,Go) 

^ ' dnew(pr*,0r*,<*,m”) 


0 as n —00. 


Our argument is organized into three steps. 

Step 2.1: From D 2 and D^, we obtain p”A0” / dnew (p^* > ^1* > ) 

i < S 2 — I- Combining with D” and D^, we achieve 

Ap”/ dnew (pr*, 61”*, u”* , m”* ), p” Au”/dnew (p^*, 6'!'*, , m”* ) ^ 0 as n ^ CX) 

Therefore, we also have p” ( Ad”)^/dnew (pr* > 0 p” (u”)^/dnew (pr* > ^1** > 

) for all Si < z < S 2 — 1. These results show that 


0 as n —)• oo 


0 as n —)• oo for all si < 

for all Si < Z < S2 — 1. 

'';”*,m”*)^ 


Un = 


S2-1 

j = Sl 


.m 


.n\2 


/dnev/{Pi*,G?*,vf,,mi*) 74 0 as n 


00 . 
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Step 2.2 : Now for 1 < j < 4, we also have 


:= 


d{Gn,Go) 


-^--/ 3 ”i —)■ 0 as n —)■ oo. 

j ( n an n 
i^new \Pi* 1 i "‘'i* / 


Since p^Av^/dnevjiPi*,di*,Vi,,mi,) and p^{Aviy/dm^iPi* ,di*,Vi* ,171^,) 8 ° to 0 as n 
Si < i < S 2 — 1 , we obtain that as n —)■ oo 


7r(cr°)® 


i=si 


/dnew (^r* > di* > ) -^ 0> 


and 


p'jiim^y + 2m^)(Ai;”''^ 

j = Si 


87r(cj' 




/c^new {Pi*, di*, njl, m”*) ^ 0. 


Combining these results with E2, we have 


K, = 


^P"m%A 
j = si 


m 


n\2 


/rfnew br* ) d2* , Vi* , fry* ) -> 0. 


OO for all 


Step 2.3 : As [/„ 7 ^ 0 as n ^ oo, we obtain 


S2 —1 S2 —1 

Vn/Un = Y. p^m^iAwyfl E ^ 0- 

f=«l 3=S1 


(51) 


Sinee > 0 for all si < i, j < S 2 — 1> without loss of generality we assume that > 0 for all 

Si < j < S 2 — 1. However, it implies that 


E P^m]{AmYl E E Pli^^lf! E pK^^D^’ (52) 


j=si 


J=S1 


3=si 


J=si 


whieh means min I m2 f = 0. This is a eontradietion. In sum. Case 2 eannot happen. 

5i<JI<52—1 I ^ ) 


Case 3: < i* < ko- Sinee d{p2*, O^* > '^i* > vrdl* ) jdiGn , Go) 0, we obtain 

T{p2*,92*,v2*,rn2*)ld{Gn,GQ) 7 ^ 0 as n ^ 00 , 


where r(p)l,0)1,,m”*) = |Ap)l| +p)l(|A0)l| + |Au)l| + |Am)(,|) > As a 

eonsequenee, for any ai + a 2 + 03 < 1 , as n —)■ 00 


d{Gn,Gy 


t{p 2* , 92*, vf, , ) 


R mO .,0 _o 


''ai0203 wi 




However, from the proof of part (a), at least one of the above eoeffieients does not go to 0, whieh is a 
eontradietion. Therefore, Case 3 eannot happen either. 

Summarizing from the arguments with the three eases above, we eonelude that not all of 7 )) (1 < 
i < 5,1 < j < h),(d2j (1 < i < 4, 1 < j < i 2 -l), <77 (1 < i < 5, Si < j < s.^- 1 , 1 < / < i 2 -l), 
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^ 010 : 203(*^1 + ^2 + cts < 2) go to 0 as n —oo. Denote rrin to be the the maximum of 
the absolute values of these coefficients and dn = Ijmn- Then, dnO^ji otiji for all 1 < z < 5, s; < 
j < Sj+i - 1, 1 < z < Z 2 - 1, dnl3fj ^ij for all 1 < z < 4, 1 < j < Z 2 - 1, dnT*” 7ij for all 
1 < z < 5, 1 < j < zi, and *^ 1 °) "z?) —)> Xaia 2 a 3 i for all Sj^ < i < k^. Therefore, by 

letting n —oo, we obtain for all x € M that 


dnjPGr^jx) - PGo{x)) 

d(G„,Go) 


Ai{x) + A 2 {x) + ^ 3 ( 2 :) = 0, 


where Ai{x) = ^ (y1 lij{x - O^Y ^ ] f 

j=i \i=i 


X 


-0? 


22 — 1 ^ —1 5 




■’'.-42(x)=E E Eaw(i 

1=1 I j=Si i=l 


Xf 


X - 00 


$ 


m^(x — 9g) 


cry 


cry 


+ E ^ilix - d^iY ^exp - 




i=l 


fco 


^ 3 ( 3 ;) = E E ^ 

*=% l«l<2 


d\Af 




-{x-elY 


>, and 




(x|0O,z;O,mO). 


Using the same argument as that of part (a), we obtain aiji = 0 for all 1 < z < 5, si < j < si+i — 1, 
1 < Z < Z 2 — 1, = 0 for all 1 < z < 4, 1 < j < Z 2 — 1, and 'yij = 0 for all 1 < z < 5 and 1 < j < zi- 

However, we do not have Aa^a 2 a 3 i = 0 for all < i < Zcq and 0 < |a| < 2. It comes from the 

d\Af 


identity in Lemma lTI^ which implies that all 


-(x| 0 O^ z; 0 , rrzO) are not linear independent as 


(90“iz;"2m"3 ^ I * ’ * ’ *' 

0 < |a| < 2. Therefore, this case needs a new treatment, which is divided into three steps 


StepF.l: From the definition of m„, at least one coefficient aiji, fdij,'yij, Xaia 2 a 3 i equals to 1. As all 
Oiiji, f3ij,'^ij equal to 0, this result implies that at least one coefficient Xaia 2 a 3 i equal to 1. Therefore, 


nir 

0 , 


= LB. 


B. 


01020^3 


(9? 


(00,, z;0 , mO 


3^2' 2 

yO,, mY, 

2 2 


for some a*, > “3 


and 


< i' < Zcq. As A0” , An”, Am7 


when 01+02 + 03 = 2 is dominated by \B, 


a.\a.20Lz 


( 00 ,, z; 0 ,, mO,) I when 


oi + 02 + 03 < 1 .Therefore, o, + On + o. < 1 , i.e, at most first order derivative. 


StepF.2: A^{pG^{x)-pGa{x))/W^{Gn,GQ) 0, we also have (pG„( 3 :)-PGo(a;))/ff'i(Gn, Go) 

0. From here, by applying Taylor expansion up to first order, we can write {pg^ {x)—PGo {x))/Wi {Gn, Gq) 
as i(x) + Ln^ 2 {x) + 3 (x) + L„ 4 (x) where 4 (x) is Taylor’s remainder term, which means 

that L„^ 4 (x)/VFi(G„, Go) —)• 0. Additionally. Lnp{x), Ln, 2 {x), Ln, 3 {x) are the linear combinations 


of elements of /(x|0O, z;0, mO), ^(x|0O, m?), ^(x|0,^ z;?, m?), ^(x|0O, nO, m?). In Ln,i{x), 

the index z ranges from 1 to zi while in Ln, 2 {x),Ln, 3 , the index z ranges from zi + 1 to — 1 and from 
to ko respectively. Assume that all of these coefficients go to 0 as n —>• + 00 , then we have 




.( 0 ' 
1 A , 


2 ' 2 


mO,)|d(G,„Go)/IUi(G„,Go)^0, 


(53) 


where the limit is due to the fact that \Ba*a*a* j ) "z^ )|'^(Gn, Gq) /Wi{Gn, Gq) is the maximum 
coefficient of L„ i(x),+^, 2 ( 2 ;), B„^ 3 (x). However, from the result of the proof of Theorem 13.11 we 
have 


ko 


W,{Gn,Go) < ^p7{\A6Y\ + \Avf\ + |Am”|) + |Ap” 


< 


2 = 1 


max {|Ap”|,|A0”|,|Az;”l,Am”|} 

l<t<ko 


= \ Bala* a* (6*° , <7° , "Z°, ) | cZ(Gn, Gq ), 


which contradicts to (|5^ . Therefore, at least one coefficient does not vanish to 0. 
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step F.3: Denote to be the maximum among the absolute values of these coefficients and d'^ = 
1/m^. Then, we achieve 


,V^, ,m^,)\d{Gn,Go)/Wi{Gn,Go) = 1 for all n. 


Therefore, as n —oo 
3 ko 


i=l i=l * 


df 


+a'^i^{x\e'l,v'l,ml) \ = 0 . 


j ^ 

' dm ' 

where one of a '^^,, , 03 ^/, a'^^, differs from 0. However, using the same argument as that of part (a), 

this equation will imply that a'^ = 0 for all 1 < j < 4 and si^ < i < ko, which is a contradiction. 

We have reached the conclusion (|4^ which completes the proof. 

Best lower bound of V{pg:PGo) Go satisfies condition (S.2): We have two cases 

Case b.l: There exists m? = 0 for some 1 < i < ko- Without loss of generality, we assume 
mi = 0. We construct the sequence Gn G £ko (© ^ {Apf, A9f, Av^, Am”) = (0,0,0,0) for all 

2 < i < ko and An? = An? = 0, A9? = —, Am? = —With this construction, we can check 

n (Tj'n 

that A9i + Am”cTj*/\/^ = 0. Using the same argument as that of part (b) of the proof of Theorem 


with the notice that V{pg„^PGq) = J \R{x)\dx where R{x) is Taylor expansion’s remainder in 

K 

the first order, we readily achieve the conclusion of our theorem. 


Case b.2: There exists conformant cousin set R for some 1 < i < ko- Without loss of generality, 
we assume i = 1 and j = 2 G /i. Now, we choose Gn such that Apf = A0” = Anf = 0 for all 


1 < i < ko, Am” = 0 for all 3 < f < ko. Ami — —,Am 2 = —Then, we can guarantee 


n 


nrn 


that Ami/vi + Am^/n^ = 0. By means of Taylor expansion up to first order, we can check that 
ViPGn,PGo) = / \R{x)\dx where R{x) is Taylor remainder. From then, using the same argument as 

R 

case b.l, we get the conclusion of our theorem. 

Remark: With extra hard work, we can also prove that W 2 is the best lower bound of h{pG,PGo ) as 
Go satisfies condition (S.2). Therefore, for any standard estimation method ( such as the MLE) which 
yields convergence rate for pG, the induced rate of convergence for the mixing measure G is the 

minimax optimal under IT 2 when Gq satisfies condition (S.2) while it is the minimax optimal 

under Wi when Go satisfies condition (S.l). 

PROOF OF THEOREM gj] This proof is quite similar to that of Theorem 14.11 so we shall give 
only a sketch. It is sufficient to demonstrate that 


sup \pg{x) -pGo{x)\ 


> 0 . 


lim inf 

e^-OGeOfc(0xn) 


W^{G,Go) 
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■-Wrn{G,Go)<e 


(54) 





Assume by contrary that (l54l) does not hold. Here, we assume r is even (the case r is odd number can 
be addressed in the same way). In this case, m = r. Denote v = a"^. Then, there is a sequence = 

/cq 5 j 

E E for all 1 < i < fco, 1 < J < s*. 

i=lj=l 

Define 

fco 

d{GM = +\^<\^) + \pi-p% 

i=l j=l 

where A0^j = 0^- — 0°, Av^j = vfj — v^, Am^- = mfj — m°. Now, applying Taylor’s expansion up to 
r-th order, we obtain 


,> ,, „(A»’‘r(At.”r(Ara”r» sm/ , , 

PgAx)-PGo{x) = 2^2^Pij - - - , - - - ix\dhVi,rn^i) + 

i=l j=l 
ko 


Q!i!a2!cK3! 


c)u“2 c)?7^“3 

+ - P^)fix\Oi,Vi,mi) + Ri{x) := Ai{x) + Bi{x) + Ri{x), 


1=1 

where a = (ai, a 2 , as), i?i(a:) is Taylor remainder and Ri{x)/d{Gn, Go) —^ 0- 
Now we invoke the key identity (cf. Lemma iTAl) 

3 f 1 (9^ f 77? ^ -\- TTt 3 f 

^{x\e,v,m) = ^^ix\e,v,m) + 

It follows by induction that, for any a 2 > 1 


3x^2 2"2 00202 


i=l 


2 “ 2 -* 3v 


i—1 


2v 39“^^^^ ^'>3m 


Therefore, for any a = (ai, 02 , as) such that a 2 > 1, we have 


3\»\f 


O0aiOt,O2Om"3 2"2 O^^i+^oaO^os 


^01+202+03 f <^2 ^ gai+as+i—l 

+E 


i=l 


2«2-* 36°‘^3m°‘^3v 


i—1 


+ m \ 

2v 3m ) 


Continue this identity until the right hand side of this equation only contains derivatives in terms of 6 
and m, which means all the derivatives involving v can be reduced to the derivatives with only 6 and 

3\d\ j 

m. As a consequence, A\{x)/ d{Gn, Go) is the linear combination of elements of ——5- v , m) 

where 0 < |/3| < 2r (not necessarily all the value of (3 in this range). We can check that for each 
7 = 1 ,..., 2 r, the coefficient of -q^{x\9^, m°) is 


E'y{9i,Vi,m^) = 


Si 


Ej>5 E 

j=l ni+ 2 n 2=7 
niH-n2<r 


(A0-.)-i(Au-)”^ 




2"'^ni!n2l 


/d(Gn,Go). 


Qr j 

Additionally, the coefficient of the r-th order derivative with respect to m, --— ={x\9^,v^,m^), is 

am’’ 

^ p'^AAmijY/d{Gn, Go). Therefore, if all of the coefficients of Ai{x)/d{Gn, Go), Bi{x) / d{Gn,Go) 
i=i 
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_ 

go to 0 , then as r is even, we obtain ^ |Amjj|’'/d(Gn, Go) — 0 for all 1 < z < ko and 

i=i 

ko 

£ \p?. ~ P^\/d{Gn, Go) 0. It implies that 
i=l 


ko Si 

+\^m,n/diGn,dG,) ^ 1 . 

i=l j=l 


Therefore, we can find an index z* G {1,..., fco} such that ^ pf* (lA^Jl + | Ami*j|^)/(i(Gn, dco) 'A 


j=i 


0. By multiply this term with , n?*, ) as 1 < 7 < r, we obtain 


Epf-j E 

i=i 


ni+2n2=7 
ni+n 2 <r 


2"-2ni!n2! 


/y^p7(|A»: 

i=i 


n ir 
i*j\ 


+ |Ami*j 


which is a contradiction due to the proof of Theorem 14.11 Therefore, not all the coefficients of 

Ai{x), B{x) go to 0. As a consequence, for all x G M, [pcni^) —PGo{x))/d{Gm Go) converges to the 

q\0 \f 

linear combinations of ——3 -3-(x|09, u?, rrz?) where at least one coefficient differs from 0. However, 

50^1 m/32 ^ I 

q\p\ f 

due to Assumption (SI) on Go, the collection of qq^^q —" 3 ?) are linearly independent, 
which is a contradiction. This concludes our proof. 

The following addresses the remarks following the statement of Theorem 14.7 1 


Best lower bound when k — ko = 1: The remark regarding the removal of the constraint Ok,co is 
immediate from (the proof of) Proposition 14.31 To show the bound is sharp in this case, we construct 

k Si 

sequence Gn = Pi'jd{e",v",m^) ^s follows si = 2 , Sj = 1 for all 2 < z < ko, p^i = P 12 = Pi/ 2 , 

i=lj=l 

A .611 = 1 /n, A 6'^2 = = Ai.Vi 2 = — 1 /n^, Arriii = Am ^2 = where an is the solution 

of following equation 


^2 

n'^Vi 


( 3(m°)^ + 1 

V 


3(m5)^ 

n'^iv^Y J 


an T 


{miY + 


n^{viY 

{rriiY + 

2 0 
n^Vi 


+ 


{m^Y + "3]* 


Tz^(z;j’)^ 

mi{{miY + 1 )^ 
4n^(z;5’)^ 


+ 


= 0 , 


which has the solution when n is sufficiently large. Additionally, |a„| >c 1/rz^ —0 when n —>• 00 . The 
choice of will be discussed in the sequel. Now, for any 1 < r < 4, we have W[{Gn, Go) > l/re’’. 
By using Taylor expansion up to the fourth order, we can write {pcY^) ~ PgY^))/^{{G n^Go) 

as the linear combination of the first part, which consists of ——(x|0°, n?, m°), z;°, m°). 


df 


(x|0°,z;°,mO), 


d^f 


{x\9Y vy,m'. 




d^f 


(x| 0 ?, X?, m?) plus the second part, which consists of 


50 ^ ' dOdm" ' dOdv^ ' *' 

the remaining derivatives and the Taylor remainder. Note that the second part always converges to 0. 

For i = 2,... ,ko, the coefficients of the derivatives in the first part are 0, thanks to our construction of 

Q J 

Gn- Thus, only the case left is when z = 1. By direct computation, the coefficient of — — {x\9Y m-5) 
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IS 


(m5)3 + m? 2 I + 

Z^PiA^f^ii) + —— Z^Piii^f^ii) 


2 v^ 


2=1 


12 


3(m5)^ + 1 


^— Z^Pii[^(^ii) - Z^Pi 

1 i—l V 1/ 


i=l 
2 2 


±pnAe’;.)HAvi,f + ^*(Aer.)^A„r.A» 


2=1 


2 = 1 


3m5 


2?;? ^ 

^ 2=1 


)2 + ^pnAml = 0, 


2=1 


where the equality is due to the fact that the left hand side of this equation is equal to the left hand 

side of equation (l55]) . Therefore, the choice of a„ is to guarantee the coefficient of —— to be 0. With 

dm 

similar calculation, we can easily check that all the coefficients of —, —, ——, tttttt- are also 0 . 

ov oO oOom oBov 

Therefore, the assertion about the best lower bound immediately follows. 


Case k — ko = 2: In this scenario, we conjecture that W^{G, Gq) is still the best lower bound of 
V(pg^PGo)- Following the same proof recipe as above, such a conclusion follows from the hypothesis 
that for any fixed value m / 0 , ci^ > 0 , fhe following sysfem of 8 polynomial equations 


'^dfai = 0, + bi) = 0, '^d‘f{^ + aibi) = 0, '^d‘f{^ + a^ibi + ^) = 0 

2=1 2=1 2=1 2=1 

- o o _ o 


E ,2 / mZ + m n + m 9 , 3m^ + 1 9 


2=1 


+ 


2 >{m? + l){m^ + m) 4 + m 2,2 , + 1 ^ cm 22 , \ n 

-4i^T--Tlfi —h + ^_4 bid - —tti Ci + Ci ) = 0 


2cr® 


2c74 
1 2 u 

-^^aibid - ^ 

,2 


72 / rn^ + m o + m o, 3m?1 o , 

^ Qg-2 g^4 g^2 ^ 2 ^ 2 ) — 0 


2=1 


A 2. m^ + m 4 m^+m 2.2 + 1 2, , hd. „ 

(-+ 2c74 -2^“' ° 


2=1 

3 


+ + m 9 , 2>m? + l 99 c? 

4,„4 + y> =» 

2=1 


does not have any non-trivial solution, i.e dj / 0 for all 1 < z < 3 and at least one among ai,..., 03, 61 , 
..., 63, Cl,..., C3 is non-zero. 
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APPENDIX II 


For the sake of completeness, we collect herein the proof of technical results and auxiliary argu¬ 
ments that were left out of the main text and Appendix I. 


PROOF OF C0R0LLARY |33I From Theorem l3.5[ the class {g{x\r], A), r/ € 0*, A G is iden¬ 
tifiable in the first order. From the proof of Theorem 13.11 in order to achieve the conclusion of our 
theorem, it remains to verify that g{x\r], A) satisfies condifions (011 and (l5]l. As fhe firsf derivafive of / 
in ferms of 0 and S is a-Holder confinuous, f{x\9, S) safisfies condifions (O and (lH) wifh Si = 62 = a. 

Now, for any 77 ^, 77 ^ G 0*, A G fl*, we have T{g^,A) = {9^, S) and T{rf‘ = (0^, S). For any 
1 < 7 < di, we obfain 


A 

dgi' 






l<U,V<d2 


l<U^V<d 2 


d9i 


{x\9\T.) 


dVi 
d[T2{v\A)] 
dVi 


+ 


Nofice fhaf. 



d[Ti{g\A)]^ 

dgi 


i=i ‘ 




,ari, 1 ,, STi, 2 


<L,||9‘-e''|r + L2l|i)‘-tf"r, 


where Li, L 2 are fwo posifive consfanfs from fhe a-Holder confinuify and fhe boundedness of fhe 
firsf derivafive of f{x\9, S) and T{g,A). Moreover, since T is Lipschifz confinuous, if implies fhaf 
11^^ “ ^^11 ^ “ d^ll- Therefore, fhe above inequality can be rewrillen as 


dl 


»f, tya/, ,^2 „8[r,(,2,A)]^ 




i=l 


dgi 




i=l 


dgi 


<U-riY- 


Wifh fhe similar argumenf, we gel 


E 


»/ , ,,.1 


l<U,V<d 2 


5S,. 


-(x|0^S) 


dgi 


E 


df ,.„m2 ^^^[^2(d^A)] 


l<U,V<d 2 




-(x|0^S)- 


dgi 




Thus, for any 1 < 7 < di. 


^(5r(x|77\ A) - 5 r(x|? 7 ^ A)) 


< 

r\^ 


As a consequence, for any 71 G , 




< - |^(x|772,S)||||7i|| < 1177^-77^ 


which means fhaf condition (0]) is satisfied by g{x\g, A). Likewise, we also can demonsfrafe fhaf con¬ 
dition (l5]l is satisfied by < 7 ( 0 :|d, A). Therefore, fhe conclusion of our corollary is achieved. 
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PROOF OF THEOREM 1331 (a) Assume that we have aj, G M as 1 < j < /c, A: > 1 such 
that: 


+ Pj-^ix\9j,aj) + 7j^ix\9j,aj) = 0 . 

i=i 

Multiply both sides of the above equation with exp(ifx) and take the integral in M, we obtain the 
following result: 


ewiitOj) = 0, 


(55) 


i=i 


where a'- = aj- — , j3'- = 13j , 7 ' = - — 


CTi 


a 


J exp(ifx)/(x)(ix, and V’(f) = J exp{itx)xf'{x)dii 


r{p + it)T{q — it) = < 


(56) 


— ‘it') 

By direct calculation, we obtain (pit) = -—;r^-. Additionally, from the property of 

r(p)r(g) 

Gamma function and Euler’s reflection formula, as p, q are two positive integers, we have 

'p-i <?-i -Jit 

+ ,v ifp,9>2 

j=i j=i smn(7rrj 

P~J- jct 

n (p-i+ *f)—r-—ifp>2,q = l 

j=i sinn(7rrj 

U{Q-j-it) . , ifp=l,q>2 

j=i sinh(7rr) 

;r _ _ 1 

Umh(7rf)’ ' ^ ^ 

From now, we only consider the case p,q > 2 as other cases can be argued in the same way. 

p—1 g—1 p-\-q—2 

Denote Y\{p — j + it) Y\ id — 3 — it) = S h is clear that oq = 0 {P ~ j) H (q ~ j) 

j=l j=l u =0 j=l j=l 

ap+g_2 = (-l)''-i.iP+''-V 0. 

From (l56l) . the characteristic function (p{t) can be rewritten as 

p+q-2 

27rexp(7rf)( 

= r(p)r(g)(exp(L) -1) • 

Additionally, since xf'{x) and f'{x) are integrable functions. 


'ip{t) = j exp{itx)xf'{x)dx = ( f exp{itx)f'{x)dx j = (iA'^(A)) = 4 >{t) + t(p'{t). 


By direct computation, we obtain 


V’(i) = 


P+q-2 P+q-2 

27r( ^ a„(u + 2)f“"''^) exp(7rf) 27r^( ^ a„t“+^)(exp(27rf) + 1) exp(7rt) 

W=0 Fi=0 


r(p)r(9)(exp(27rf) - 1) 


r(p)r(9)(exp(7rf) - 1)2 


(58) 
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Combining (l57]) and (l5^ . we can rewrite (l55l) as 


p+q—2 

k ( anO-“+H“+^)exp((7rcrj+6'j)i) 

(“i + /3j(*i)) -^^- + 

i=i 

p+q-2 


(exp(27rcjjt) — 1) 


7 ji a«(tt + 2)(T“+^i“+^)exp((7rfTj+z0j)t) 


u=0 


(exp(27r(Tji) — 1) 


p+q-2 

7'7r( ^ a„(T"+^t“+^)(exp(27r(Tjt) + 1) exp((7r(Tj + i9j)t) 

n=0 


r(p)r(g)(exp(7r6»jt) - If 


= 0 . 


0 ,- 


Denote t' = vrt, 9'- = —, fj- = —, aif = 


Jj) _ ^ 0 ) ^ ciuiu + 2)a]+^ ^ 


TT ^ TT ’ 7^w+l 


TT 


u+1 


and cf = 


TT 


u+2 

j _ 

•u+2 


for all 1 < j < k, 0 < u < p + q — 2 and multiply both sides of the above equation with 

k 


(exp(2(Tjf) — 1)^, we can rewrite it as 


j=i 


k p+q—2 

+!}’;+')){ “7(*')“+‘) + 

j=l u=0 

p+q-2 

I'ji X] ^u\t'T^^)) exp((crj + r0')f')(exp(2crjf') - 1) J| (exp(2a/f') - 1)^ - 

u=0 l^j 

p+q-2 

n=0 l^j 

Without loss of generality, we assume that cti < <72 < ... < <Tk- Note that, we can view 


exp(f'cjj)(exp(2(Tjf') — 1) 0 (exp(2cr;f') — 1)^ as expfef’) where < ef’ < ... < e 

l^j u=l 

are just the combinations of cti, £72,..., cr^ and mj > 1 for all 1 < j < k. Similarly, we can write 


dj)\ 


dj) ^ Jj) 


U) 


exp(f'crj)(exp(2(Tjf') + 1) 0 (exp(2cjif') — 1)^ as ku^ expfh'ff, where hf’ < ... < hf. and 

ij^j «=i 

Uj > 1 for all I < j < k. 


Sj)^ 


,(i) 


Direct calculation yields Cm] = hf. = 4 ^ <7; + 3aj and eHj = hf’ = 1 for all I < j < k. 
From the assumption, it is straightforward that em\ > eml > ■■■ > eml- Additionally, by denot- 

„ p+'j-s , .s p+ij-2 -p+g-i , 

ing {a'j +/3j{it')){ a^u\t'T^f + Iji E = E we obtain= 

u =0 u =0 u =0 

and = if ”for all 1 < j < k. 

By applying the Laplace transformation in both sides of equation (l59l) . we get: 

k p+q-l rrij ^ P+Q-^ , oO 


Aj) 


Aj) 


Aj) 


E E fA E 


j=l 11=0 


dfiju + 1)! 

Aj) 


U1 


(a AJ)'\u+2 

=1 t* — Zui ) „=o 


^ TjTTC^ ^ fcni = 0 as Res(s) > . (60) 


Ul 


=1 (s - 
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where Zui = iO'j + elf/ as 1 < m < nij and telf/ = iO', + /ilf/ as 1 < tii < rij. 


Aj) 


(i) 


,(i) 


Multiplying both sides of equation (l60l ) with (s — and letting s —)■ Zml, as ^ 


-til 


< e 


mi 


for all (Mi,j) / (mi, 1 ) and = e^^| for all (wi, j) 7 ^ (ni, 1 ), we obtain |/p+q_i(im| - 

— 0 Since — 1 — iB ”and 

ll^^p+q-2^iil I ~ ^lace ami — >^ni — /p+g-1 ~ *Pl ®p+q-2’ ^p+q-2 ~ ^ ^p+q-2’ 

p+q—1 

(1) _ ap+q-2(Ti 

P+1—2 ^p+q—1 

Likewise, multiplying both sides of (1^ with (s — Zmi)^'^'^ and let s 

p(i) _ n r'r^r.Ur...o mic focUir^r. ..^.01 of (1^ with (s — Zml) and let s -+ Zml 

1 11 

to get /q^'' = 0 or equivalently a/oQ^'* = 0 . As Oq^'* = f+i Y\ ijp — j) 11(9“ i)/^ / 0> it implies 

i=i i=i 

that q/ = 0. Overall, we achieve a/ = Pi = 7i = 0- Repeat the same argument until we achieve 
a'j = P- = 7 / = 0 for all 1 < j < A: or equivalently Uj = Pj = 7 ^ = 0. 


7 ^ 0 , it implies that |i/3/ — 7 (fTi| = 0 or equivalently /?/ = 7 ^ = 0 . 

(i)ip+q ^ obtain 

/p+q _2 = 0- Continue this fashion until we multiply both sides of (1^ with (s — Zml) and let o —r ^mi 


(b) Assume that we can find aj, Pj , 'Jj-rjj G M such that 

k 


AAi £ -P -P 

'^ajf{x\ej,aj,Xj)+Pj-^{x\ej,aj,Xj)+'yj-^{x\ej,aj,Xj) + r]j-^{x\ej,aj,Xj) = 0 . (61) 


i=i 


Applying the moment generating function to both sides of equation (IMT) . we obtain 


(a/ + Pjt + ^'jt'ilj{Xj — Ujt) + r]j'il){Xj — (Tjt)) exp{9jt)T{Xj — ajt) = 0 as f < min | ~ p 
j^l l<i<fc f CTj J 


(62) 


where a' = 13' = rf = and 


r(A,) r(A,) 

0 ' = 9j + log(Aj)(Tj as ijj is di-gamma function. 


r(A,)’ 


r(A, 


Without loss of generality, we assume that cji < a 2 < ... < CTk- We choose i to be minimum 
index such that aj = Uk- Denote p G [f,A:] as the index such that 9[ = min {9[}. Denote I = 

i<i<k 

{f G [z, A;] : 0' = }. From the formation of 9'-, it implies that A* are pairwise different as i £ I. 

Choose i 2 G 2 such that Ajj = max Aj, i.e Xi^ > A* for all i £ I. Divide both sides of equation (l62l) by 

iei 

fr(l — ai^t)'iA{l — (Ti^t) exp( 6 <' A), we get that as A < — 


< I PL I ./ I I y a;.r(A, -a,A)exp(g(.A) 

tipiXi^ - cji^t) ?/'(Ai2 - ^nP) t Ar(Ai2 - ai^t)'tp{Xi2 - (Ji^t) ex'i){9[^t) 

PjT{Xj - ajt) exp( 6 ''A) 7 ' exp(0'A)r(Aj - ajt)'il;{Xj - ajt) 

r(Ai2 - o'i2^)V'(Ai2 - exp(^i2^) exp(6''2A)r(Ai2 - o-j2A)z/(Ai2 - cFi^t) 

'q'j exp(0'A)r(Aj — ajt)'4){Xj — ajt) 
Aexp(6''^A)r(Ai2 - ai^t)'il}{Xi^ - Oi^t) 


+ 

-h 


= 0. (63) 


Note that ^ lim V'(Aj — ajt)/'ip{Xi^ — ai^t) = 1 for all l<j<k. Additionally, when j £ I and j 7 ^ 
Z 2 , as Xj < Aj 2 , we see that r(Aj — ajt)/T{Xi^ — ai^t) —)• 0 as A —)• —00 and exp((0' — 9[^){t)) = 1. It 
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implies that exp(0'i)r(Aj—(Tjt)/exp( 0 ' 2 t)r(Aj 2 ——)• Oasi —)• —oo. Since —)• +oo 

as t —)• —oo, if we let t —oo, we obtain 


^ a'r(Aj — ajt) exp( 0 'f) — ^jt) exp{9jt) 

^r(Ai2 - exp((9'2f) r(Ai2 - o'*2^)V'(Ai2 - o-*2^) exp((9'2f) 

7 ' exp(0'f)r(Aj - ajt)'4){\j — ajt) rj'- exp(0'f)r(Aj - ajt)'ip{Xj - Ujt) 
exp( 6 »' 2 f)r(Ai 2 - ai^t)i}{Xi^ - Oi^t) tex.Y>{e[^t)V{Xi^ - ai^t)'ijj{\i^ - ai^t) 


+ 


0. (64) 


Additionally, as j > i and j ^ I, we have cjj = ai^ and Therefore, we obtain exp((0' — 

“ <^jf)/r(Aj 2 — ai^t) —)• 0 as f —— 00 . As a consequence, if we let t — 00 , then 


^ a'r(Aj — ajt) exp( 0 'f) - ^ji) exp( 6 *'t) 

- fr(Ai 2 - o-i 2 *)V’(Ai 2 - exp( 6 '- t) r(Ai 2 “ ^i 2 't)‘^{^i 2 - f^i 2 ^) exp( 6 '' t) 

7 ' exp( 6 ''f)r(Aj - ajt)ip{Xj - ajt) r(- exp(0'f)r(Aj - ajt)il){Xj - ajt) 
exp( 6 »'^f)r(Ai 2 - ai^t)i>{Xi^ - ai^t) t ex.^{e[^t)T{Xi^ - ai^t)i>{Xi^ - ai^t) 


+ 


0. (65) 


Now, as j < i, we have aj < ai^. Therefore, as r(Aj — ajt)/T{Xi^ — ai^t) ~ (—i)(o'i 2 “^i)* when 
f < 0, we get exp((0' — 9[^)t)T{Xj — ajt)/T{Xi^ — ai^t) —>• 0 as f —>• — 00 . As a consequence, if we 
let t —>• — 00 , then 


^ a'r(Aj — ajt) exp{9jt) ~ ^j't) exp(0'T) 

^ fr(Ai 2 - ^i 2 t)'^{^i 2 - exp( 6''20 r(Ai 2 - ^i 2 t)i’{^i 2 - exp(6''2f) 

3 

7 ' exp(0'T)r(Aj - ajt)ijj{Xj - ajt) rjj ex.p{9jt)T{Xj - ajt)'4>{Xj - ajt) 
exp{9'^^t)T{Xi^ - ai^t)ip{Xi^ - ai^t) texp{9[^t)T{Xi^ - ai^t)il){Xi^ - ai^t) 


+ 

^ 0 . ( 66 ) 


Combining (l64l) . (l65l) . and (l 6 ^ . by letting t —)■ —00 in (IMI) . we get 7 '^ = 0. With this result, we divide 
both sides of (l 6 ^ by fexp( 6 '' 2 f)r(Ai 2 — ai^t), we obtain that as f —>• —00 


^ r?^ 2 V^(Aj 2 - (Tj^t) a'jT{Xj - ajt) exp(g^t) /3jT{Xj - ajt) exp{9'jt) 

t t ^^tT{Xi^ - ai^t)ex.p{9[^t) T{Xi^ - at^t) ex.p{9[^t) 

7 ' exp(0'T)r(Aj — ajt)ip{Xj — ajt) r]j exp(0'f)r(Aj — ajt)il^{Xj — ajt) 
ex.p{9'i^t)T{Xi^ - ai^t) f exp(6''2f)r(Ai2 - (Ti2f) 

Using the same argument with the notice that ex.p{{ 9 j— 9 ^^)t)xl^{Xj—ajt)T{Xj—ajt)/r{Xi 2 —ai 2 t) —)• 0 
as f —)• —00 for all j / ii and 'ip{Xi^ — ai^t)/t —)• 0 as f —— 00 , we obtain j3'^^ = 0. Continue in this 
fashion, we divide both sides of (l 6 ^ by V'(Aj 2 —o'i 2 t) exp( 0 '^f)r(Aj 2 —o'i 2 t) and exp( 0 '^f)r(Ai 2 — crj 2 f) 
respectively and by letting t — 00 , we get a'^ = 1112 — 0- Applying this argument to the remained 
indices i, we achieve a'- = /?' = 7 '- = r/' = 0 for 1 < j < or equivalently aj = 13j = jj = rjj = 0 
for I < j < k. 


(c) Assume that we can find aj, /3j , 7 ^ G M such that 


^ ajfx{x\uj,Xj) + 3j^{x\uj,>^j) + >^j) = 0 - 

i=i 
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It implies that by the transformation Y = log(X), we still have: 


k 


E 


dfv dfv 

ajfviyWj, Aj) + /3j-^{yWj,Xj) + Xj) = 0. 


dX 


where fviy) is the density function of Y. 


( 67 ) 


Applying the moment generating function to both sides of (l67l ). we obtain 

ajXjT{— + 1 ) - + 1 )' 0 (— + 1 ) + 7 j 7 A*“^r(— + 1 ) = 0 as t>- min {i/J. ( 68 ) 

^3 ^3 l<i<k 

Without loss of generality, assume that < 1^2 < ■ ■ ■ < Denote i as the minimum index such that 
i/j = vi and ii is index such that = min {Aj}, which implies that < Xi for all 1 < i < i. 

Using the same argument as that of generalized gumbel density function case, we firstly divide both 
sides of (l 6 ^ by + l)il3{t/i'j + 1) and let t + 00 , we obtain = 0. Then, with this result, 

we divide both sides of (| 68 ] ) by + 1) and let t + 00 , we get 7 ^^ = 0. Finally, divide both 

sides of (l 6 ^ by + 1) and let t + 00 , we achieve = 0. Repeat the same argument until we 

obtain ccj = /3j = 7 ^ = 0 for all 1 < z < A:. 


(d) The idea of this proof is based on main theorem of lKenti lll983ll . Assume that we can find aj , /3j , 'jj G 
M such fhaf 


^ ajf{x\f 3 j, Kj) + + lj^{x\yj,Kj) = 0 . 


i=i 


We can rewrife fhe above equafion as 


k 

[a'j + /3j sin(x — fij) + 7 j- cos(x — fij)] exp(Kj cos(x — /Zj)) = 0 for all x G [0, 27r). (69) 

i=i 


where C{k) = 2 -^j^(ky ^ I3'j 

l<j<k. 


—C{Kj)/3j, and 7 ^ = C{Kj)'yj for all 


Since fhe funcfions exp(Kj(x —/Zj)), cos{x — fij) exp{Kj{x — fij)), and sm{x — fij) exp{Kj{x — fij)) 
are analytic funcfions of x, we can exfend eauafion (l 6 ^ fo fhe whole range x G C. Denofe x = y + iz, 
where y, z G M. Direcf calculation yields cos(x — yj) = cos{y — yj) cosh(z) — i sin(y — yj) sinh( 2 ;), 
sin(x — yj) = sin(y — yj) cosh( 2 ;) + i cos(y — yj) sinh(z), and 


exp{Kj cos(x — yj)) = exp^Kj [cos(y — yj) cosh( 2 ;) — zsin(y — yj) sinh(z)]). 
Therefore, we can rewrife equafion (l69l ) as for all y, 2 ; G M 


k 

^ {a'j + [P'j cos(y - yj) + 7' sin(y - yj)] cosh( 2 ;) - i [/3' sin(y - yj) - 7' cos(y - yj)] sinh( 2 ;)} 
i=i 

exp {Kj [cos(y — yj) cosh( 2 ;) — i sin(y — yj) sinh( 2 ;)]) = 0. (70) 

As {yj,Kj) are pairwise differenl as 1 < y < A:, we can choose af leasf one y* G [0,27r) such fhaf 
rrij = Kj cos(y* — yj) are pairwise differenl as 1 < y < A: and cos(y* — yj), sin(y* — yj) are all 
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different from 0 for all l<j<k. Without loss of generality, we assume that mi < m 2 <■■■ < mk- 
Multiply both sides of (TTOI) with exp(—rrife + iKk sin(y* — /r^) sinh( 2 )), we obtain 

+ [Pk cos(y* - ^k) + Ik sin{y* - yk)] cosh( 2 ;) - i{/3k sm{y* - y,k) - 7 ^ cos(y* - y,k)) sinh( 2 ;)| 

k-l 

= ^ Wj + [/3' cos{y* - fij) + 7' sm{y* - fij)] cosh( 2 ;) - 
i=i 

i [Pj sin(y* — yj) — jj cos{y* — yj)] sinh( 2 ;)| x exp((mj — mk) cosh( 2 ;)). 

Noted that as mj < mk for all l<j<k-l, 

lim cosh(z) exp((m,' — m^) cosh(z)) = lim sinh( 2 ;) exp((m,' — mk) cosh( 2 ;)) = 0 . 

Therefore, by letting 2 ^ 00 in both sides of the above equation, we obtain 

Wk + [P'k cos(?/* - yk) + ik sin(y* - yk)] cosh(z) - i{f3k sm{y* - yk) - 

Ik cos{y* - yk)) sinh(z)| 0 . 

It implies that = 0, l3'kCos{y*-yk)+'y'kSm{y*-yk) = 0, andsin(y*-/ifc)- 7 ^ cos(?/*-/ifc) = 0. 
These equations imply = Pk ~ ^k ~ the same argument for the remained a'-, /3'-, 7 ' as 

1 < i ^ — 1> we eventually achieve a'- = 7 '- = 7 '- = 0 for all 1 < j < A; or equivalently 
aj = Pj = 7 j = 0 for all I < j < k. 


PROOF OF THEOREM iMl (Continue) Part (a) was proved in Appendix I. The following is the 
proof for the remaining parts. 

(b) Consider that for given A: > 1 and k different pairs ( 6 * 1 , Si), {9k, S^,), where 6j G Sj G S’j"'' 
for all 1 < j < A:, we can find aj G M, Pj G and symmetric matrices 7 ^ G such that: 

k 

^ ajf{x\ej, Sj) + /3j^{x\ej, T,j) + tr(^(x|0j, Sj)^ 7 j) = 0. (71) 

i=i 


Multiply both sides with exp(it^x) and take the integral in M“, we get: 

k 

I f^vri {'r\ rv ■ -f f 'r\ ft ■ V! • t 4- / 


k „ 

^ / exp(zA^x) 




ajf{x\0j,^j) + 0j ^{x\0j,T.j) + tr{^{x\ej,T.j)^-fj) 


dx = 0. (72) 


Notice that 

/ exp{it'^x)f{x\9j,T,j)dx 
R'i 

j exp{if'x)l3j^{x\6j,J:j)dx 

R‘i 


exp{if9,) J exp{iiJ:yh)^x)j-^^^-^^^^^dx. 

R'i 

C{u + d) f exp(i(s]/^i)^x)/3jS7^/^x 

2 J (z.+ ||x||2)(^+rf+2)/2 

Rd- 
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and 


J ex.p{it^x) tT{^{x\9jjj)dx = -^tr(S^- ^'yj) ex.p{it^9j) x 


J exp{it^9j) 




exp{i{T.y^t)'^x) 

{u + || x | p )('^+‘^)/2 


dx + 


C{v + d) 


■exp{it^9j) J 


Rd 


p{i{T,y'^tyx) tr(i:^. 

{u + ||xP)(‘"+‘^+ 2)/2 


dx. 


From the property of trace of matrix, tr(S^ ^^^xx^S^ Equation (17^ 

can be rewritten as 


k 

E 

j=i 


a'- exp(i(sj^^f)’^x) exp(i(sj'^^f)'^x)(/3')^x exp(i(sJ'^^f)'^x)x'^Mjx\ 


+ 


+ 


- dx 


(^1/ _j_ (z^ + ||2;||2)(^+'^+2)/2 (^jy _|_ || j.||2^(i/+d+2)/2 j 

X exp(zf^0j) = 0,(73) 


tr(S^ ^ {iy + d)^_, 
2 ’ 2 


where a^- = /3' = ^ 2 ^^.^ and Mj = 


J 


^ = 


To simplify the left hand side of equation (1731) . it is sufficient to calculate the following quantities 

exp(if^x) , D f exp(ff^x)(/3')'^x f exp{it^ x)x^ M x 


(Z^+ ||x||2)(^+'^)/2 


dx,B = 


-dx, and C = 


{v + ||x|| 2 )('^+'^+ 2)/2 


dx, 


{v + ||x||2)('^+'^+2)/2 

R'' R'^ R'' 

where P' E and M = {Mij) E 

In fact, using orthogonal transformation x = O.z, where O E and its first column to be 

- 7 ^)^, then it is not hard to verify that exp(ff'^x) = exp(f||f||zi), ||x|p = \\z\\'^, and 
Irll Irll 

dx = I det{0)\dz = dz, then we obtain the following results: 

exp(z||f||zi) 


A = 


(l/+ ||z|| 2 )(^+'i )/2 
R" 

= j exp(f||f||zi) 

R 

= ciA^m), 


dz 


J (l^+ ||z||2)(^+rf)/2 


dzddzd-i...dzi 


where Ci = 

J= 2 .. 

Hence, for all 1 <j<k 


TT f 7 - j_\in dz and Ai{t') = [ , dz for any t' 

LJ.J _j2)(i/+j)/2 > J (-y ^2^(v+l)/2 

exp(i(s|^^f)'^x) 


/ 


( zy + || x || 2)(^+'^)/2 


dx = ciAiipiyhw). 


(74) 


Turning to B: 


b = Y.I^'= 

3 — ^ TI 


exp(ff^x)xj 


{v + ||x||2)('^+'^+2)/2 


3 — ^ H 


exp(f||f|| 2 :i)(X; Ojizi) 


l=i 


{u + ||z|| 2 )(^+'^+ 2)/2 


-dz. 


(75) 
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When j / 1, since 


^7 


is an integrable odd function, 


exp{i\\t\\zi)z 


I 

Simultaneously, using the same argument as (T/dt . we get 


{v + ||2;|P)(i^+'^+2)/2 


'^dz = 0. 


where = H / (i + ^2)t+2+.-)/2 = J ^r any t' G M. 

R R 

Therefore, we can rewrite (1751) as 

l|i|| 


It demonstrates that for all 1 <j<k 

J (17+ ||x||2)(-+«*+2)/2 p 

Rrf 


(76) 


Turning to C: 


Ti r f exp(if x)xj r,\^ i\r f exp(^it^x 

= ^ J (^+||^||2)++d+2)/2'^^ + Jlj (j7+||x||2)C 

J 1 Rd I^d 

Notice that, for each 1 < y < d: 

r exp(d^x)x| exp(i||(||^i)(EO„^,)2)2 ^ 

J {l/+ ||a-|P)<‘'+'<+2)/2‘^^ “ J (|/+ ||z||2)(-+<i+2)/2 * 

R'i R"^ 


i7+d+2)/2 


^ v-n 2 f I 

||_2||2)++d+2)/2“^ 

^ = 1 TUd 


f exp{i\\t\\zi)zuZy ^ 

h ' 'j (^+||.f)(-+^+2)/2^"- 

roti 


exp(i||f||zi)2;„2;„ . 

■ ^^|U|I 2 n 07 ^^^ 2 W 2 ^^ = 0- Addihon- 


Asu < V, then one of u,v will differ from 1. It follows that / -- , . ,,„w„ 

J (r^ + llzir )++“+2)/2 

ally, as / / 1, we see that 

f exp{i\\t\\zi)zf , _ /■ 

J (j/+||z||2)(-+^+2)/2“^ J 

R'^ R'^ 


exp(ij^^i)£|_ _ 

IUI|2')++d+2)/2®^ 6x3.4i(||f||). 


{u+ \\z 
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1 


where = 


/ 


;dz\[ 


(1 + J (1 + 2;2)(!^+2+i)/2 


dz. 


r,- , f exp(i||i||zi)z? , ^ ,,s . . / /N f expfih'lz)^^ , ^ , 

Similarly, j ^ = C^ 2 ^ 3 (||t||). where ) = J my t 


Therefore, 


(.+lx||ff^+2)/2 ‘i^' = C20|l^3(l|(||) + C3(l - 0|,)A. 


As a consequence, 

d 


j 

i=i w 


eyiy){if'y)x] 




dx = C2(^M,yO|i)A3(||f||) + 


i=i 


C3(j;M,,(l-0|i))Ai(||f||) 

j=i 


Simultaneously, as j ^ I 


/ 


exp(if^a:)xja:; 
(i/' + 


dx — ^ ^ OjuOii 


U=1 


exp(i|t||zi)zg 
{y + || 2 ;|| 2 )('^+i^+ 2)/2 


dz = 


u=2 


Combining (T/^ and (17^ . we can rewrite (T/Tl) as: 


C = C3(j;M,-,0Ai(||f||) + (^M,,O,iO;i)(C2A3(||f||)-C73Ai(||f||)) 

j=i 

d 

= C3(^M,y)Ai(||t||) + . 

i=i " j,i 


A(y^Afj,tj(,)(C2^3(ll(||)-C3Ai(||t||)). 


Thus, for all 1 < y < d 

I 


exp{i{Y,y‘^t)'^ x)x'^ MjX l 

'' dx = 




(z^ + ||xP)(^+'^+^)/^ 

d 

x(C2A3(||s;/2f||)-C3Ai(||sff||)) + C3(j;M,pAi(||sff||), 


1=1 


(78) 


C20yi0llA3(||f||) + C3(^ OyOiMM) = OilO/l(C2A3(PI|) " ^3^1(11^11)). (79) 


(80) 


where mP indicates the element at w-th row and u-th column of Mj and simply means the 

1/2 

n-th component of S ■ t. 
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As a consequence, by combining (1741) . (17^ . and (IMl) . we can rewrite (1731) as: 


k B' 

E [“Ai(ll^]''"‘ll) + C2 ’ HI „ ^(II^E*ll) + es(E+ 

i=l ll^j ^11 l=l 

(C2^3(||Sy^t||)-C3Ai(||Ey^t||))]exp(it^g^^ = 0. (81) 

Define t = fif', where fi E M and t' E By using the same argument as that of multivariate 
generalized Gaussian distribution, we can find D to be the finite union of conics and hyperplanes 
such that as t' ^ D, ((t')^0i, are pairwise distinct. By denoting 

9j = {t'Y'Oj, aj = {t'Y'Tjjt', we can rewrite (1^ as: 

k ^ B'- 

^ [a'Ai((Tj|fi|) +C 2 -r^-|-- ^-A 2 {aj\ti\) + C' 3 (^+ 

j=i J 


l=i 


-^)(C2A3((Tj|ti|) - (7'3Ai(crj|fi|)]exp(t6''ti) = 0. 


u,v J 

Since A 2 {(Tj\ti\) = (t|fi|)Ai((Tj|ti|), the above equation can be rewritten as: 

k d vyl/2 n 

E[(«;+ c3C£K) - c3(E^g. " ,2 " ))Ai(.T,itii) + 


C2(!*i) 


i=i 


1=1 


O': 


(To- 


^i(c^i|ii|) + -^)A3(cjj|fi|)]exp(t6l'ti) = 0.(82) 


As u is odd number, we assume u = 21 — 1. By applying Lemma 1731 (stated and proved in the sequel), 
we obtain for any m E N that 


+ CXD 

/ 


:p(i|ti|z)^ 27r exp(—|fi|\/2( — 1) 


E 

i=i 


(z2 + u)^ (2y/2l - l)2m-l 

—00 

It means that we can write 

Ai(fi) = C4exp(-|ti|V2t - 1) 


2 m-l-j\ {2\ti\^/W^y-^ 

m-j J (j-1)! 


z-i 


u=0 


where = 


27r 


(2V2( - 1)2™ 


-1 


> — 


21 -u-2\ {2VW^y 
I — u — 1 J u\ 


Simultaneously, as A 3 (ti) = Ai{ti) — v J 


exp(t|fi|2;) 

{u + 2;2^(i2+3)/2 


dz, we can write 


i 

^sih) = G4exp(-|fi|V2/ - 1) 

u=0 
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\f 2 l-u- 2 \ 1 f 2 l - u 

It is not hard to notice that oq, a/_i, 6 ; 7 ^ 0. 


(2^21^)“ ,4/ 1 1(2^21^)^ 

-as 0 < M < t— 1 , and hi =-. 

u\ 4 l\ 


Now, for all ti £ M, equation (l82l ) can be rewritten as: 


k 

E 

j=i L 


l-i 


a,' 




ti=0 


ii=0 


ex' 


— ajV2l — l|fi|) = 0, 


where a” = a'- + Csif^ M^) - 




C2{EMl 


" ' 1=1 


(7 a 


"), /?; = Q 




CJo 


, and 7 ^. = 


cr 




The above equation yields that for all fi >0 

z-i i 


k 

E 

j=i L 


a” + (iti)] ^ aucrpi + 7j ^ bu( 7 p\ 


u=0 


u=0 


ex' 


p{iti 6 'A — aj\/ 2 l — Ifi) = 0 . 


(83) 


Using the Laplace transformation on both sides of (l83l) and denoting Cj = ajy/2l — 1 — iO'j as 1 < j < 
k, we obtain that as Re(s) > max |—(T,V2/ — 


z-i 


E“;e- 




U—1 


+^-9;e 


^ 7/1/) /yV 

// V—^ ll.UiiU . 

+ 7jE--^ = 


^ (s + C,)“+1 ^ (s + C,)“+1 ^ ^ (s + c,)“+i 

j = \ u=0 u=l ^ ' u=\) ^ ^ 


(84) 


Without loss of generality, we assume that ai < (72 < < (Jh^ It demonstrates that —a\\/2l — 1 = 


max 


{—(Tj\/2^ — l}. Denote aif = aucr^ and bu = bufr'j for all u. By multiplying both sides of 

(IMl) with (s + ci)*+^, as Re(s) > —aiy/2l — 1 and s —)■ —ci, we obtain + 7 ^' 6 //! 6 p^| = 0 

or equivalently [3i = 7 ^ = 0 since / 0. Likewise, multiply both sides of (IMl) with (s + ciY 

and using the same argument, as s ^ —ci, we obtain = 0. Overall, we obtain d.^ = d\ = 7 i = 0- 
Continue this fashion until we get a- = fi- = 7 j =0 for all 1 < j < A: or equivalently Oj = Pj = 
7 j = 0 for all 1 < y < fe. 

As a consequence, for all 1 < j < A, we have 






=0, 


Z=1 


u,v 


(7a 


(7a 


= 0 , 




(7 a 


Since Y ^uv^y‘^t'\u\^y'^t']v = = {t'Yjjt', it is equivalent that 

U,V 

d 

dj + C3{J2 K) = 0> = 0 , and = 0. 


z=i 
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With the same argument as the last paragraph of part (a) of Theorem 13.41 we readily obtain that 
a' = 0 , / 3 ' = 0 G and 7j = 0 € From the formation of it follows that aj = 0 , 

= 0 G R*^, and 7j = 0 G for all 1 < y < A;. 


(c) Assume that we can find a* G R, / 3 i G R'^, r]i G R'^, and 7^ G symmetric matrices such that: 

k 


^ai/(x|6»i,Si, Ai) + / 3 f^(x| 6 »i,Si, Aj) + tr((^(x|6»j, Sj, Ai))'^7i) + ^(x|6»i, Ej, Aj) = 0 .( 85 ) 


2 = 1 

where 9i G R'^, Ej G and Aj G R'^’+. 

From the formation of /, we have / = fy^fz, where /y (x| 0 , E) 

g{x) = a/(i/+x)("+'')/ 2 , a = r(^)WVr(^)7r''/2, fz{x\X') 
where 61,..., G N are fixed number and A' G R'^’"''. 


j^p2 5'((a;-6')'^E ^{x-9)), 

exp(-A'xi).l{^,>0} 


Denote 


4>zit\x) = j eyi^{it^x)fz{x\X)dx. Multiplying both sides of ( 1 ^ with exp(ff^a:) and take 


the integral in R“ , we have following results: 

k 


J ex.p{it'^x)f{x\9j,T,j,Xj) = ^^ajaz{t\Xj) J exp{it^x)fY{x\9j,T,j)dx. 

i=i jjd i=i jjd 

k h 

^ J exp{if'x)/3^ -^{x\9j,X:j, Xj)dx = '^az(t\Xj) J exp(ft'^x)/3j’-^(x|6lj, Ej)d 


I exp(ff^x)tr((^(x|0j,Ej,Aj)%)dx = ^cTz(t|Aj) / exp(if^x) tr((^(x|0j, Ej))%)dx 


i=i 

d 


dfy, 


3 — 


^5E 


/ exp(if'^x)7j'^(x|6lj, Ej, Aj)dx = / exp(if'^x)/y (x|0j, Ej)dx x 


i=i 

d 


aE 




9A 


X f exp{it^x)r]J^^{x\Xj)dx. 


dX 


Therefore, under this transformation, equation ( 1851 ) can be rewritten as 

k 


'Y^azit\Xj){aj / exp{idx)fY{x\9j,T,j)dx + 


f=i 


J ex.p{if'x)pJ^^{x\9j,T,j)dx + J ex.p{idx)ti{{^^{x\9j,T.j))'^'yj)dx) + 


exp(zf^x)/y (x| 0 j, Ej)dx / exp(zf'^x)7j-^^(x| Aj)dx = 0 . 


( 86 ) 
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Using (IMT) . we have 


J exp{it^ x) fY{x\6j ,T,j)dx = Ci^Ciexp{it^9jAi{\\'E,y‘^t\\)) 

R‘i 


and 

k 


^ [ (o:jfY{x\6j,T.j) + Pj^^{x\ej,T,j) + exp{it^x)ti{{^^{x\ej,J:j))^-fj) \ exp{if'x)dx = 
i=iw ^ ^ 


k 

Ec„ 

i=i 


a'j + Cztv{Mj) - 






+ 


1 ^ + 1 J 

C 2 t^'yjt 


- 4 i(l|S‘'''^t||) 




exp{it^9j) + 
exp{it^9j). 


, / /X f exD{i\t'\z) , . / /X f exp(i\t'\z)z'^ , „ , ^ , , 

where Ai{t) = j Aglt ) = J t G K, and a^. = a, - 




/3' = ^^S-V2/5 and M,- = 


ly + d^- 


1/2 ^-1/2 


J 2 " “ 2 ""2 


Denote /z;(x/|A9 = exp(-A;a:/).l{3,,>o} and 4>Zi{t\X'i) = J exp{itxi)fziixi\X'i)dxi 


as 


A; G M, we obtain 


d d (Xl.)bi 

^z{t\x,)= n</>z,(x/iA^.)=n ^_^, ^ 


where Aj = (A],..., Xj). 

Additionally, by denoting rjj = (rjj,..., rj'^) 


J exp{it^x)r]J^{x\Xj)dx = '^VjYl^Zu{tu\XJ) J exp{itiXi)^^{x\X^j)dxi 

Rd R ^ 


1=1 u^l 

* ^ JJ; (A“ - ' 
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k d 

Multiplying both sides of equation (1^ with H 0 ~ we obtain: 

j=lu=l 


E 

j=i L 








eyip{id0j) ]J (A“)^“(A“ JJ ]J (A“ - - iCi exp(if'^6»j)^i(||s]'^^t||) x 


U=1 


l^j u=l 


E n 1 ^?)*“*' n (^“ - '*“> n n (^“ - =». <8^) 


/=1 


u^l 


u^l 


/ “=i 


d 

where u'- = a' + C' 3 (X] ^u)- Using the same argument as that of multivariate generalized Gaus- 
z=i 

sian distribution, we can find set D being the union of finite hyperplanes and cones such that as 
t' ^ D, ((f')^0i, (f')^Sif'),..., Ok, {t')'^T,kt') are pairwise different. Denote t = fif', where 
fi G i? and t' ^ D and O'- = {t'Y^Oj, cr| = (f')^Ejf'. For all ti > 0, using the result from mul- 

Zi-l 

tivariate Student’s t-distribution, we can denote Ai{ti) = exp(—fi-y/F) ^ and A 3 (ti) = 

u=0 


h 

CJ exp(-fi^) x; bJi, where u = 2li - 1 and oq, ai^_i, bo, / 0. 

u=0 


Define 


/Zi-l 

( E 

V«=o 


n (A“)'“(A“ 

U = 1 


<u) n n (Ar 

u=l 


mi 

c^ti, where mi 

u=0 


^1 + 


d 

d — 2 + {d+ Y bu){k — 1). Additionally, we define 

U=1 


mi+1 


y; die; := (J^Mf) 

l^j U = 1 


u=0 


\u=0 


u=l 


and 


mi + l 


^Zi-1 




u=l 


\u=0 


1=1 


u^l 


u^l 


) w “=1 


Equation (l87l) can be rewriffen as 


k 

E 

j=l L 


mi 


mi+1 mi+1 

(a" + I5”{iti)) citl + 7 " Y 

u=0 u=0 u=l 


exp{i0'jti — CTjy/v) = 0 , ( 88 ) 


" I rd r H.T \ U 3 (t')^ 7 ,f' II C^/ b'YP'j „ C 2 {t'Y'Jjt' 

where = a'- + C 3 tr(M, )- ^ ’ f^j = —^ 7 X 1 —^ 


i^ + l 


O’" 


j j 

Wifhouf loss of generalify, we assume cJi < <72 <...< cta;. Denote hj = aj^/u — iO', and apply 


+1 
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Laplace transformation to (IMI) . we obtain that as Re(s) > 


mi 


E“;'E 


ciul 


mi + l 


i=i 


u=0 


(s + hj)^+^ 


+=/5; E 


j "»i+i 


dlul 


^ (s + /ij)“+i 


w =0 

mi 




(s + 

elul 


^ (s + 


0. (89) 


Using the same argument as that of multivariate Student’s t-distribution, by multiplying both sides of 
equation (IMt with (s + and let s —)• —hi, we obtain + 7idmi+il = 0- Since 


1 , ('^+E bu)ik—l)+d ,, , 

4=H) “=° 

U=1 


' {iy — t) + l 


and 


“mi 


,('^+E bn)(k—l)+d 

(-^) bi^ 




U=1 


the equation + 7 idmi+il = 0 is equivalent to |i/3j^'aq_i + ^ihi^\ = 0 , which yields that 

= Jik^ = 0. As aq_i, bi^ 7 ^ 0, we obtain jd'l = 7 ^ = 0. 


With this result, we multiply two sides of (l89l ) with (s + and let s —hi, we obtain 

ki^mi — I = 0. Then, we multiply both sides of (l89l ) with (s + hi)"^^ and let s ^ —hi, we 

get — 1)! — iCiel^^_i{mi — 1)!| = 0. Repeat this argument until we obtain |q;iCq| = 0 

and \aic\ — iCie\\ = 0, which implies that Oi[ = 0 as Cq = oq 0 0 / 0 and e\ = 0. 

l = l U=1 

From the formation of e\, it yields that 

»o (E’('i*7(Ai)‘‘-‘ n (Ar)‘“+M n n =»■ 

U^l J 1^1 U=1 

As ao 7 ^ 0, it implies that 


d 


1=1 u^l 


= 0 . 


Denote r][bi{X{)^‘ ^ H for all 1 < ( < d then we have ^ = 0. If there is 

U^l 1 = 1 

d 

any 'ijj[ 7 ^ 0 , by choosing t' to lie outside that hyperplane, we will not get the equality ^ = 0 . 

1=1 

Therefore, ipi = 0 for all 1 < ( < d, which implies that = 0 for all 1 < ( < d or equivalently 
7/1 = 0. Repeating the above argument until we obtain = Pj = Ij = 0 € M and 77 ^ = 0 G 
for all 1 < j < k. From the formation of a”,/ 3 ”, 7 ”, using the same argument as that of multivari¬ 
ate Student’s t-distribution, by choosing t' appropriately, we will have aj = 0, = 0 € M'^, and 

7 j = 0 G for all 1 < j <k. 
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(d) Assume that we can find aj E M,/3j E symmetric matrices 7 ^ E E and Tj E 

such that 


i=i 




Vj-^ix\ej,j:j,aj,bj)+ tI—{ x\ej,T.j,aj,bj) = 0. (90) 


db 


Denote Z = Yi ^j’ where Zj ~ Gamma(aj, 6j). Let (j)z■{tj\ 0 'j,bj) to be the moment generating 
i=i 

function of Zj, then (j)Zj{tj\aj,bj) = bj^ /{bj — as tj < bj. Therefore, the moment generating 

d 6 “^' 

function 6) of Z is 77 —~^ tj < bj for all l<j<d. 

j=l (bj - tj) :> 

Multiply both sides of (l90l ) with exp(t^x) and take the integral in using the same argument as 
that of multivariate generalized Gaussian case, we obtain that as ti < min \ b) > for all 1 < f < /c 

1 < 7 <A: I 


Y,{aj + I3jt + ^ + log 


J-h 


i=i 


1=1 




exp(f^6»j + ]-t^Z,jt) ^ 




^ (b) - 


= 0 . 


kd ^ 

Multiply both sides of the above equation with n n (&L — we can rewrite it as 

u=l i=l 


Y. ^ + Yi i°g 




d 


i=i 


l=l 


d 


If ("i - ‘• 


^ rjajt, n ('>? - *«)) exp{t’’e, + -f'T.jt) (b]Y'i ff 11 


1 < + 1 


= 0. (91) 


i=i 


U^l 


Z=1 


*=i 


Put t = tit' as fi E M and t' E We can find sef D, which is fhe finife union of hyperplanes and 
cones such fhaf as t' ^ D and t' E we gef fhaf {{t')'^6i, (f')^Sif'),..., {{t')'^9k, (P)^SfcP) 

are pairwise differenf. Therefore as f* < min |^j| for all 1 < i < /c, we gef fi < f* = 


min ^ ^ f • Denofe 6'- = d'Oj and cr^ = t^Tjjt, as ti <t*, we can rewrife (|9TI ) as follows 


i<j<k,l<i<d I t] 
k 


^ Oj + hfdjt' + tl 




bi 


+E’';>°eLrA7r ny-;-*;*!) 


i=i 


1=1 


d 


Y n exp(0'fi+^) Yi n n 


<+i 


= 0. (92) 


i=i 


U^l 


Z=1 


*=i 


Wifhouf loss of generalify, we assume fhat cti < £72 < ... < crfc. By using fhe same argumenf as 
fhaf of mulfivariafe generalized Gaussian disfribufion in Theorem (13.41 ). we denote i to be minimum 
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index such that = Uk and as the index such that 6[ = min \0'A. Multiply both sides of 

i<j<k I' 


n-2 /2 


with exp(—0' ti -——) and let ti —)■ —oo, using the convergence argument of generalized Gaussian 

k 2 

case, we eventually obtain as ti —oo 




+ log 


1=1 


\ - *!*i, 




i=l 


E4<‘:*inw. n n ("t - 

1 = 1 U^l i = l ^—1 




Since n 0 U (K - 

i=l u^i}^ 2=1 

— OO, 


+ooasti —)> —oo, the above result implies that as fi 


B{ti)= (04 + + 4k log 




1=1 




n - Y. n K - 

u^l 


(93) 


2=1 


1 = 1 


Note that the highest degree in terms of ti in B{ti) is d + 2 and its corresponding coefficient is 
d (t'A'Y- t' 

(-ir n . As B{ti) —)■ 0 as ti ——oo, it implies that {t'AjiA = 0, which yields that 

i=i 2 


d 


7 ij, = 0 under appropriate choice of t'. Similarly, the coefficient of in B{t\) is (—1)“ 0 mA- 
Therefore, jdf t' = 0, which implies that = 0. With these results, from (1931) . we see that 


2=1 


Y ^ik log(l'l - ] n (K - i'iti) ^0 as fi 


—oo. 


,«=1 


i=l 


It follows that r]l = 0 for all 1 < f < d. Now, the coefficient of in B{ti) is ai^. H K ; therefore. 


2=1 


it implies that = 0. Last but not least, the coefficient of ti now is — X] 0 Thus, we 

/=! " u^l 

d 

have ^ n ~ H" appropriate choice of t', we obtain = 0 for all 1 < Z < d. 

Repeat the above argument until we get a* = 0, /3j = = 0 E M'^, and 7 ^ = 0 E which 

yields the conclusion of our theorem. 


Lemma 7.3. For any m E N, we have 

+00 


exp(ftx) ^ 27rexp(—|t|) 


(x2 + I)*- 


22m —1 


E 

i=i 


2 m-l-A (2|f 1)^-1 

m- j J (j - 1)! 


(94) 
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Proof. Assume that t > 0 and for any i? > 0, we define Cr = IrVJTr, where is the upper half 
of the cirele \z\ = R and Ir = {z £ C : |Re (^)l< R and Im(z) = 0}. Now, we have the following 
formula: 

/ explifz) , f exT)(itz) , f exn(itz) , 

, dz = i -^P—^dz + (p -P^-P-dz. 

Cr Ir Vr 


1 .T ■ . I explifz) , f expUtx) , 

Notiee that ® -- —dz = / -r— dx, therefore 

7 p + ip J {x^ + ip 

Ir -R 


7 (^2 + 1)™ J (x2 + l)™ 7 (^2 + l)m 

Cfl Tfl 

/GXT) ( ziz ) 

- pdz, from residue’s theorem, we have 

[z^ + Ip 

Ir 

/ exp(itz) , „ . „ / explifz) \ 27ri exp(zfz) 

, 7 dz = 27r^.Res , / = --- lim --- , / . 

(^2 + 1)™- Z=i \( 2;2 + l)”^y (m — 1)1 z-^i dz^ ^ {z + i)^ 


By direet ealeulations, we obtain 


d™" ^ exp(itz) exp(—t) ( 2 m — y — 1 )! /m — IV 7-1 

z^idP^JzTip ~ i ^ 22 ™-i VJ-ly 


Thus, it yields that 


exp(zf2;) 27rexp(—f) (2m — y — 1)!/m — l\ 

(i2”+T)™ ^ ~ (m - 1)! ^ 22"i-i V J “ 1 / 


27rexp(—f) -P (2m — 1 — j\ {2t)i ^ 


exp^^-(.; / z.m - 1 - j \ yz.LR 

2‘2m-i 2^^ ^ m — j y (j — 1)! 


Additionally, 


exp (if z) 


dz\ < 


-\dz\ = 


(^2 + 1 )"^ - 7 1(^2 X)m| I I {R^ + iy 


0 as i? —> 00 . 


As a eonsequenee, as f > 0, by letting i? —)■ 00 , we get: 


exp(ifx) 27rexp(—f) •^/2m — l—j\ {2ty ^ 

(x 2 + 1 )™ ^ “ 22 "^-! ^ m - j y (j - 1 )! 


, . , T itx , exp(—ifx) , 

For the ease f < 0, notiee that —^^—dx = —z -^— dx, we aehieve 

(x2 + 1)- (x2 + l)m 


exp(ifx) 27rexp(f) /2m — 1 ~ j7 (“2f7 ^ 

(x2 + 1)”^ ^ “ 22"^-! ^ m-j J {j - 1)! 


The lemma is proved eompletely. 
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PROOF OF THEOREM im (Continue) We present here the proof for general d > 1. This proof 
is similar to the case d = 1, with extra care for handling matrix-variate parameters. For any sequence 

/co Sj 

Gn G C>fc,co(0 X 0) Go in Wr, we can denote Gn = where , EG) 

i=lj=l *■’ 

(p°, 0°, S°) for all 1 < i < ko and 1 < j < Si < k — ko + 1. Let N be any positive integer. For any 
r > 1 and for each x E M, by means of Taylor expansion up to any N order, we obtain 

ko Si ko+m 

pgM-pg.W) = y; (p”- p»)/(i|«?,e») 

2=1 j = l 2=1 

^0 Si N „|„| \n 0 

= + EEpB E (A«.”)“‘(AE5r— ‘ + 


i=l j=l |o|=l 


a\ 


ko 


where p" = E P?p M^) = E (p^ - P°)/(x|0O, S^), = 0- - 0°, AS- = S- - S^ for 


i=i 


i=l 


ko Si 


all 1 < i < feo, 1 < j < Si, and Ri{x) < 0(E E Additionally, 


i=lj=l 








a = (ai, 02 ), where ai = {a\, E N'^, a 2 = (aly)uv G |q;| = E “i + E ^^d 

2=1 l<22,2;<d 

«! = n Oijl n alJ. Moreover, (A^-)"! = 0 and (AS-.)"^ = 0 (AS-.)"r 

2 = 1 l< 22 , 2 ;<d / = ! l<u,v<d 

where (.)/ denotes the Z-th component and {.)uv denotes the element in u-th row and u-th column. 
Finally, Z)l“l/(x|0O, S?) = 


50“i9S“2 d 

n 90;' n dj: 

1=1 l<u,v<d 


2 

uv 

UV 


_ Rif gf 

From Lemma im we have the identity S) = 2-—(x\0, S) for all 0 E and S E 

d0‘^ oT, 


Therefore, for any a = (cti, 02 ), we can check that 

gMf 1 d^Pif 

96'"1()S“2 ~ 2l“2l OP ’ 


(96) 


d d 

where /3; = + E “h + E for all 1 < Z < cZ, which means |/3| = |ai| + 2 |a 2 |. This equality 

j=i i=i 

means that we can convert all the derivatives involving S to the derivatives only respect to 9. Therefore, 
we can rewrite (1951) as follows: 


2 l“2la^!a2! 9P 


^ (A95r(AE5)«8lffl/, 

PgAx)-PGo{x) = 2^2^PijX^ - nU.i.. , - 

i=l j=i |/3|>i 

+ Ai(x)+i?i(x) 

:= ^i(x) + i?i(x) + i?i(x), 


(97) 


where /3 is defined as in eg nation l96l 

Now, we proceed to proving part (a) of the theorem. From the hypothesis for f, we have non¬ 
trivial solutions ix*,a*,b*)^~^°~^^ for equation ([8ll when r = r — 1. We choose the sequence of 

k 

probability measures Gn = Y. Pl^ie^,T,^) i^?)i = (^i)i + = (^i)i for < j < d, 

2=1 
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fc—fco+1 

(5^r)ii = i^i)uv = i^i)uv for iu,v) i- ( 1 , 1 ), = v\{x*fl Y1 when 

i=i 

1 < i < A; - /so + 1, and when k - ko + 2 < i < k. As 

n is sufficiently large, we still guarantee that S” are positive definite matrices asl<t<A: — /cq + I- 

/k-ko+i /|q*| 16*1 W'’ 

We can check that W{{Gn,Go) = ( 2 Pu ( —~ + ~~^) ) ^ 0 for all r > 1. Additionally, 

under this construction, si = A — feo + 1 , Si = 1 for all 2 < z < ko, = 0 

for all 1 < j < si, 2 < t < d and {u,v) (1,1), Ad". = 0 E AS". = 0 E for all 

k — ko + 2<i<k and 1 < j < Si. Now, by choosing N = r in[^ we obtain Ai{x) = 0 and 
sup \Ri{x)\/W!^{Gn, Go) —)■ 0. Moreover, we can rewrite Bi{x) in (|97] ) as follows 


k—ko-\-l r—1 


Bi{x) = 


Pu2_^ 2 ^ 


2=1 


' , 2"iiai!afi! 


fc—/ cq+I 

+ E pf.E E 




i=l 7>»'Q:J,of^ 




r—1 


R _ 2 L/ / iflO yO^ , 

7=1 7^^ 


(x|d?,S?) 


where 7 = a{ + 2a7- From the formation of G„, for each 1 < 7 < r — 1, 


fc—Alo+l 


- Cn^ 


^7 E E 

a^-\-2(y.^-^=a 


2=1 


(a*)7(67-^fi 


= 0 , 


/c —/co + 1 

where C = ^ (x*)^. As a consequence, B^^n/W^{Gn,Go) = 0 for all 1 < 7 < r — 1. Similarly, 

2=1 

for each 7 > f, 

/ /c —/cq + I \ ^ 

CWf^;(Gn,Go) = An2’'-7f ^ P?.(7<| + | 6 :i)j ^ 0 , 

(o^hk. 






and the last result is due to r < r. From now, it is straightfor- 


where A = Y1 

"i+2“?i=7 

ward to extend this argument to address the Hellinger distance of mixture densities in the same way as 
the proof for the case d = 1 . 


We now turn to part (b). It suffices to show that (|2TI) holds. Assume by contrary that it does not hold. 

ko Si 

Follow the same argument as that of Theorem l3.21 we can find a sequence Gn = '^ Y1 P'ij^{B^ ,T,V'.) £ 

C’fc.coC® X n) —Go in Wr as n —)• oo and Gn have exactly k* support points where ko < k* < k. 
Additionally, ip2j,0fj, —)• (p°, d°, for all 1 < z < /cq and 1 < j < s, < A: — Aq + 1- Denote 

ko Si ko 

d(c„,G«) = E EpSiE^sr +1 AEgr) + E ii>” - i. 

2=1 j = l 2=1 
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As we point out in the proof of Theorem 13.21 the assumption {pCn {x)-pg,{x))/w;{g^,g^)^q 
for all X G M leads to {pG„{x) — pGo{x))/d{Gn, Go) —?► 0 for all x € M. Now, by combining this fact 
with (l20l) and choosing N = r, we obtain 


(98) 


didif 

'dW 


(x|0,S) 


(Ai(x) + Bi{x) + Ri{x))/d{Gn, Go) —)• 0. 

Now, Ai(x)/fi(Gn, Go), Bi/d{Gn, Go) are just the linear combination of elements of 

d d 

where f5 is defined in equation (|9^ . i.e jdi = aj+Yl cufj + all 1 < f < d, \ j3\ = |q;i| +2|a2|> 

i=i i=i 

and |ai| + |q; 2 | < r. Therefore, it implies that 0 < \f5\ < 2r, which is the range of all possible 

dldlf 


values of |/?|. Denote Ep{6,'B) to be the corresponding coefficient of 


ded 


■(x|0,S). Assume that 


S?) —)• 0 for all 1 < f < and 0 < |/3| < 2r as n ^ oo. Using the result from (l20l) . the 
specific formula for E^{9^, S^) as |/3| > 1 is 






E 


(A()5)“>(ae 


n \a 2 
ijj 


2 l“2|Q;i!a2! 


MG„,Go). 


d d 

where ai,a 2 satisfies aj + ^ af- + X) = A for all 1 < f < d. 

i=i i=i 


By taking the summation of all |£'o(d°, E?)|, i.e fd 
0 as re —oo. As a consequence, we get 


fco 


0, we get X] \Pi. 
2=1 


p°|/d(G,i, Go) —)• 


fco Si 

Y,Y.P^,moW+\\^^W)/d{Gri,Go)^l as re^oo. 


i=l j=l 


As 11.11 and 11.11 ^ are equivalent, the above result also implies that 


h(j Si 

X^^pr,(l|Ad)5-||;+||AE-.||;)/d(G„,Go) AO as re^oo. 

i=i j=i 

Therefore, we can find an index 1 < z* < d such that 


^p)l,(l|Adr*,li;+l|AS-,||;)/d(G„,Go)74 0. 

f=i 


(99) 


Without loss of generality, we assume i* = 1. There are two cases regarding the above result: 


Si 


Casel: Thereexists 1 < re* < d and such that = J2 Piji\i^^ij)u*\^ + \{^'^ij)u*u*\^)/d{Gn,Go) 

i=i 

0. Without loss of generality, we assume re* = 1. With this result, for any |/3| > 1, we obtain 


^(AS 


Any- _ 

. _ E^{9\,E\) _ 0^2 2l«2lai!a2! 






Un 


Si 


0 . 


EPii(l(Adii)ir + |(AE-)n|A 

i=i 
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Now, we choose aj = 0 for all 2 < / < d and = 0 for all {u, v) 7 ^ (1,1), then |/3| = a\ + 2af^. 
Therefore, 




Si 

EPij 


E 


(A0-)inAS-)n“ 




fi! 


E Pii(l(A6»ij) 
f=i 


■+l(AS-)iil 


( 100 ) 


where aj + 2aii = |/3| and 1 < |/3| < 2 r. 

Denote= ^max^ = max {|(A0fi)i|,..., |(A 0 f,Ji)|, |(AS 5 ^i)ii| 1 / 2 ^..., |(AS^,Jii|V2}. 

Since 0 < p'lJpn < 1 for all 1 < J < si, we define lim Pij/p^ = for all 1 < J < Similarly, 

^ n—>-cxD ^ ^ 

define lim (A 0 ? )i/M„ = aj and lim (AS? )ii/M^ = 2bj for all 1 < j < si.Since p?- > cq 

for all 1 < j < si, all of differ from 0 and af leasf one of fhem equals fo 1. Likewise, af leasf 
one elemenf of equal fo -1 or 1. Now, for 1 < 1/3| < f, divide bofh fhe numerator and 

denominator of Sj*) by and let n —)■ 00 , we obtain the following system of polynomial 

equations 


E E 


i-l a\+2al^ = \l3\ 






a\\a 




0 for all 1 < |/3| < r. 


As 2 < Si <k — kQ + l, the hardest scenario is when si = k — ko + 1. However, from the hypothesis, 
as Si = /c — /co + 1 , the above system of polynomial equations does not have non-trivial solution, which 
is a contradiction. 


SI 


Case 2: There exists 1 < n* / n* < d such that Vn = Pijl (AS”^)^,*^,* |^/d(Gn, Go) -fr 0. 

i=i 

Without loss of generality, we assume u* = l,v* = 2. With this result, for any |/3| > 1, we obtain 


F'p{ei^\) = 


Ep{elYi) 


YPij E 


cti ,a2 


(Ad^^.)“i(AE7^.)“" 

2\°‘2\ai\a2l 


Vn 


51 


Ep?,|(AS-)i2 | 

f=l 


By choosing ai = 0 G N*^, = 0 for all {u, v) 0 {(1, 2), (2,1)}, then \/3\ = af 2 + ail- Therefore, 


E P?j E 


(AS 


n Wi 2 +“ 2 l 
IE 12 




J =1 


2l/5la?2!aii! 


51 


0 . 


EK,I(as-)i2| 

i=i 


Denote p'^ = ^max^ = ^max^ ||(AS 7 ^.)i 2 |}. 


Then, we have Pij/p'^ —^ (cl )^ > 0 and 


(ASy)i 2 /M„ = dj for all 1 < j < si. Again, we have at least one of dj differs from 0. Now, by 
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dividing both the numerator and denominator of 5 ]?) by (M^)^ and letting n — )• 00, we obtain 

Si 

= 0. This equation implies dj = 0 for all 1 < j < d, which is a contradiction. 

^ J J 

j=i 

Therefore, at least one of the coefficients does not converge to 0 as n 00. Now, 

we denote to the maximum of the absolute values of S°) where jd is defined as in equafion 

(|9^ . 1 < f < /cq and lef dn = l/m-n- As m„ 74 0 as n — 00, is uniformly bounded above for all 
n. As dn\Efs{d^, (T°)| < 1, we denofe dnEp{9^, S?) ^ Tip where af leasf one of Tip differs from 0. 
Combining fhese nofafions wifh (|9^ we gel lhaf for all x G M'^, 


PGr.{x) - PGo{x) 

d(G„,Go) 


fco 


d\p\f 




i=l 


d9f^ 


= 0 . 


Using fhe lechnique we have in fhe proof of pari (a) of Theorem 13.41 if is sufficienl fo demonslrale Ihe 
above equafion as d = 1. However, from fhe resulf when d = 1, we have already known fhal Tip = 0 
for all 1 < f < feo, 0 < 1/31 < 2r, which is a conlradicfion. Therefore, fhe asserlion of our Iheorem 
follows immediately. 


PROOF OF PROPOSITION (Continue) The case k — ko = 1 was shown in Appendix I. 
Here we consider fhe case k — ko = 2. As in fhe argumenl of case when fc — A:o = 1, we can find 
z* G {1, 2 ,... , fco + m} where 0 < m < 2 such fhal 


FL{0%,v%) = 


... - FM , v%) 

3=1 


Epr*i E 

j=l ni,n2 






niln2' 


0 , 


( 101 ) 




3=1 


where ni + 2n2 = a and 1 < a < 6. As f* G {1, 2,..., /cq + m}, we have i* G ko} or 

i* G {ko + 1, ■ ■ ■ ,ko + m}. Firsfly, we assume fhal i* G {1,..., /cq}. Wilhoul loss of generalily, lef 
i* = 1. Since si < k — ko + 1 = ^, fhere are Iwo possibilifies. 


Case 1. If Si < 2, Ihen since ^ ^ |A0y|^, we also oblain 

3=1 3=1 


Efi-i E -- /EpE^o 

3=1 


n |4 
i*j\ 


0 , 


j=l ni,n 2 

which we easily gel fhe conlradicfion by means of fhe argumenl of Case k — ko = 1. 


Case 2. If si = 3, we assume WLOG fhal p”]^|Ad”^| < p” 2 |Adi 2 | < p” 3 |Ad” 3 | for all n. Wifh 
fhe same argumenl as fhal of Case A: — fco = 1, we can gel Ad"^, Ad 32 ) Ad 33 7 ^ i'®'" ^ii n- Denote 

oi := G [—1,1], 02 := P 32 Adi 2 /Fi 3 Ai 3 C [—Ij 1]- By dividing bolh fhe numerator 

and denominator of F[{9i, Vi) by pi^A9'{^ and letting n 00 , we oblain oi + 02 = — 1. We have fhe 
following cases regarding Pii/Pio^P^/Pis' 
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Case 2.1: If both VixIVx-i.Vx^lVxz ^ oo then MlJ 0. Since A^^^g / 

0 , we denote An^j = /i.”(A 0 gj)^ for all 1 < i < 3. By dividing the numerator and denominator of 
F[{d\^ U]*) by pgg(A 0 fg)* for all 2 < i < 6 , we obtain 


Kn,i — + ^3 + ^ K 


pUM 


n '\2 
li) 


2=1 

2 


p-g(A 0 


n 'i 2 

13i 


0 , 


Kn,2 = 


Kn,3 = 


-^n,4 — 


-^71,5 — 


2=1 


n \S 




0 , 


1 

-h — + 

4! 2 




+ Eb + t + 


1 ^ 


3 _|_ (4 


r 2'\2 

'3) 


2 = 1 
2 

+ E 

2 = 1 


4! 

1 hf 

-h — + 

5! 6 


) P?3(A0f3 


2 

{Klf 


1 M 
^ IT 


{h^zY , 


+ 


n'i3 

3i 


6 


+ E 

2 = 1 


1 W 


n ^4 
4 


\ K^(A^ 

) pU^e^zf 


0 , 


0 , 


+ ^ + 


{K: 


n\2 


6! 4! 


+ 


{h:Y\ pU^e^^ 


n \6 


6 J pU^e^z) 


0 . 


If |/i”|, |/i 2 M/ 13 1 —)■ OO then Kn,z > 1/4! as n is sufficiently large, which is a contradiction. Therefore, 
at least one of them is finite. If either |/ig | or / 12 I 7 ^ 00 , then we reduce to the case when si = 2, 
which eventually leads to a contradiction. Therefore, |/ii|,|/i 2 l I ^ 3 1 ~Y 00 . Now, K^z 


implies that (/i" 


n\2 ■ 


pU^o 


n \4 
13f 


7 ^ OO for all 1 < i < 2. As PiijPi^ —)■ 00 for all 1 < f < 2, we obtain 


f A0” ^ jpi 

^ 0. Combining these results with Kn ^4 and Kn^z, we obtain — H—^ + 


(h^Y 


(Ae^zY 

^ 1 h 

and — + 


0 


P 12 /P 


3 _|_ Yh 
6! ' 4! 4 


n^2 

+ ^ 


n\Z 

3i 


13 


OO cannot hold. 


6 


5! ' 6 ' 2 

0, which cannot happen. As a consequence, both p\xlP\z 


Case 2.2: Exactly one of Pii/Pi 3 ,Pi 2 /Pi 3 —)• 00 . If Pujpi^ —)• 00 and P 12 IP 13 7 ^ 00 . It implies 
that A^fg/A^fg —^ 0. Denote P 12 /P 1 Z —^ c. If c > 0 then as P 12 A 612 /Piz^^iz ® 2 , ^&i 2 /^^iz 
02 / 0 . From the previous case 3.1, we know that at least one of |/ig |, |/i 2 |, / 13 1 will not converge to 00 . 
If I /ig I 7 ^ 00 , then iTn ,3 implies that 


3 I (^3 


1 K: 

- 1 -^ + 

4! 2 2 


n ^2 


+ 


which means that at least one of | /i 2 |, | /ig 
both I/i 2 1 ) 1^3 I 7 ^ 00 . Denote ^ /12 



7^ OO. 

and /ig 


{K^Y\ pU^o'12) 


As 


2 J pU^O^zY 


pU^o^Y 


pU^e^zY 

hs. Now, KnA 


7^ 


—> 0 , 

00 for all 1 < j < 6 , we have 
Kn, 2 ,Kn,z, and Kn^i yield the 



following system of polynomial equations 


1 , /I , \ Oo 

2+/i3+(2+/^2]- = 0, 


1 /I 3 /lo 

-1- — H —- + 

5! 6 2 


1 , /I , \ ai 

4! 2 2 V4! 2 r ’ 


5'. 6 ^ 2 


By eonverting the above equations into polynomial equations and using Groebner bases, we obtain 
that the bases eontains an equation in terms of c with all positive eoeffieient,whieh does not admit 
any solution sinee c > 0 . Therefore, the above system of polynomial equations does not admit any 
real solutions (/i2, /13, c, 02) where c > 0 . Therefore, the assumption |/i"| 00 does not hold. As a 

eonsequenee, —)■ 00 . 

I (A6^ 

Now, if |/i2 I 7^ 00 then ATn.s demonstrates that \h'^\ 00. Henee, 1 yields ' 


hi- 


on '|2 

'I 3 I 




Pl{Ae 

00 . As AOii/A9l —^ 0 andp”;^/p ”3 —)■ 00 , we aehieve /i”(A 0 ”]^)®/p” 3 (A 0 ” 3 )* —)■ 0 for all 3 < i < 6 , 
{hlfpl{Aeiy/pl{Aeiy ^ O for all 4 < i < 6 , and {hlfpHAOlf /pHAe^f 0. With 
these results, by denoting /i 2 —)• /i 2 and /13 —)■ h-^, iTn, 3 , 4 , Kn, 5 , yield the following system 
of polynomial equations 


1 




^ + /i3+Q + /i2)^-0, 

^ + /i3+ + ^ = 0, 


hi 


1 /13 

-h — H- 2. q- 

4! 2 2 

1 /13 hi 

—+—+—+ 
5! 6 2 


hi h\ 

+ — H —- + — + 
6 ! ^ 4! 4 6 


1 


+ 


6 ! 4! 


.3! 

1 /i 2 hi 

- 1 —- + — 

4! 2 2 

1 /i3 hi 

5! 6 2 

/i2 hi hi 


= 0 , 


= 0 , 


= 0 . 


We ean eheek again that Groebner bases eontains a polynomial of c with all positive eoeffieients. 
Therefore, the possibility that h^ is finite does not hold. As a eonsequenee, {h-l —)■ 00 . However, as 
both |/ii I, |/i2 I 00 , we get \hl — )■ 00 , whieh is a eontradietion. Therefore, c > 0 eannot happen. It 
implies that pH pi —)■ c = 0 . 

If 02 / Othen A 6 I]^ 3 /A 6 I ]^2 ^ 0- ^ oo>PnA 6 i^i/Pi 2 A 6 »f 2 ,K 3 ^^r 3 /Pi 2 ^^i 2 

are finite, with the same argument as that of Case 3.1, we get the eontradietion. Thus, 02 = 0. How¬ 


ever, as 


pIAQ 


\plAei 

whieh is a eontrad 


< 


plAQ 


12 


, it implies that plAOl/plAOl 


n Ann I’ .-r--x-ii—ii, r-i.,—i., ■ 0. It folloWS that Ol + 02 = 0, 

Pl3'^^13 I 

ietion to the faet that 01 + 02 = 1. Overall, the possibility that pl/pl 


00 and 

pHpI 7 ^ 00 eannot happen. 

As a eonsequenee, pHpi 00 and pH pi —)• 00 . Using the same argument as before, eventu¬ 

ally, we get to the ease when p”^/p ”3 —)■ Oandoi = 0. If A 0 ”^/A 0”3 is finite then p”^(A 0 "^y/p” 3 (A 0 ” 3 )'^ - 
0 for all 1 < j < 6 . As we also havep” 2 ('^^r 2 y/Pi 3 (^^” 3 y C) for all 1 < j < 6 , Kn, 2 , Kn, 3 , Kn,4 
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demonstrate that |/ii |, |/i 2 I oo. However, it also implies that \h^\ ^ oo, which is a contradiction. 


Therefore, 




11 




oo. 


If /12 is finite then at least one of hi and /13 is finite. First, we assume that hi is finite. Now, 
if Pii{A6ii)‘^/pi^{A9i^)‘^ 7 ^ 0 then PiiidiiY/Pi^iOi^y becomes infinite for all j > 3. Consider 

Kn 2 — Kn 1, we achieve —r + —>• 0. Similarly, consider Kn 4 — Kn 3 H —Kn 2, we obtain — + 

’ ’ 3! ’ ’ 3 ’ 5! 

un 1 

— H-^ 0, which contradicts to — + /i” —> 0. Therefore, Pii{A6iif‘/p^.^{A6i^y —>■ 0. 

6 2 1 ^ 

From iTn,l^ it shows that /13 + - —)• 0. Combining this result with iT„ 2 , Kn, 3 , Kn,i: Kn,b, we obtain 
Pii{A6iiy/Pi^{A9i^y are finite for all 2 < j < 6 . However, as is infinite, we obtain 

Fii(^ii)^/Fi 3 (^ 13 )^ 0- Combining it with iT„^ 2 ^we obtain ^3 “1“ ^ ^ which contradicts /ig + 

1/2 —)• 0. As a consequence, /i” is not finite, which also implies that /13 is finite. 

However, it means that pii{A9iiy/pi^{A9i^y 


0 for all 2 < j < 6. If 


0 then 2 cannot happen as A^fg/A^Jg is infinite. Hence, /i” 


p\i{Ad 


111 


/ig + 1/2 —)■ 0. From iTn, 4 > since hi is infinite, we achieve (/i" 


„,2P?l(A0fg^4 


pU^Ohj 

0, which implies 


that K 




P-3(A0-3)4 

0. Combining this result with iTn, 2 > we achieve ^3 + ^ 


is finite. It also means 


0, which contradicts 


/ig + 1/2 ^ 0. Thus, the possibility that h^ is finite does not hold. Therefore, Ifi^l ^ cc. Using 
the same line of argument as before, we also obtain h'l , /ig are infinite, which is a contradiction. As a 
consequence, case 3.2 cannot hold. 


Case 2 . 3 : At least one of Pii/Pig and Pi2lP\3 0 they are both finite. As ai + 02 = — 1 , it 
means that at least one of oi, 02 is different from 0. Without loss of generality, we assume oi 7^ 0 . It 
implies that p"2^^r2/Fii^^ii ^2/01 7^ 00 and pggA^gg/p^gA^fg —)■ l/oi 7^ 00. Since pgj^/pgg is 

finite, Pi^jPii 7^ 0 . Additionally, if 02 = 0 ih&n Pi^A 9 i^/Pi2A9i2 —>• 00 and p^gA0”g/pg2A0g2 
00, which is a contradiction to A 0 fg| < Pg2l^^i2l- Therefore, 02 7^ 0 . 

Ifpg2/Fii A {0) 00} then by dividing the numerator and denominator of F'^{ 9 i, by p”g(A0g]^)" 
for all 1 < a < 6 and letting n —)• 00, we achieve the scaling system of polynomial equations dUl when 
r = 6, which we have already known that it does not have any soltution. 

If P12/P11 —^ 00 then we can argue in the same way as that of Case 3.2 by dividing both the 
numerator and denominator of Vi) by p”g(A0fg)" for all 1 < a < 6 to get the contradiction. 

If P 12 IP 11 implies that Piijpi2 00 and P13/P12 —>• 00. Now, we also have 

Pi^A9i^/Pi2A9i2 —)• l/a2 y 00 and piiA9ii/pi2A9i2 —)• ai/a2 7^ 00. Therefore, we can argue 

in the same way as that of Case 3.1 by dividing both the numerator and denominator of F^{ 9 i, r;]*) by 
p”2(^^ii)'^ to get the contradiction. Therefore, case 3.3 cannot happen. 

Case 2 . 4 : Both PiilPi^,Pi 2 lPis A { 0 ,oo}. By diving both the numerator and denominator of 
F^{ 9 i,vi) by p^ 3(A6»5*3)“ for all 1 < a < 6, we achieve the scaling system of polynomial equations 
dHll when r = 6, which does not admit any solution. 

As a consequence, i* 0 ko}. Therefore, i* € {ko + 1 ,..., /cq + m}. However, since 

m < 2, with the observation that when ko + 1 < i < ko + m, each support point (0/, v^) only has 
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at most 2 points converge to, we can use the same argument as that of Case 1 to get the contradiction. 
Overall, we get the conclusion of our theorem. 

PROOF OF THEOREM UH (a) As we have seen the proof of part (b) in Theorem 14.51 condition 
> 0 plays an important role to get the inequality in (l52l) to yield a contradiction. If this condition 

S2 —1 S2 — 1 

does not hold, then it is possible that ^ )^/ ^ —>• 0. Therefore, we need a 

j=si j = Sl 

special treatment for this situation. For the simplicity of our argument later, we first consider the case 
fc* = 1 to illustrate why Wj may not be the best lower bound in general. All the notations in this proof 
are the same as those of part (b) of the proof of Theorem 14.51 Going back to Equation (ISTT) . we divide 
our argument into two cases: 

Case 1: If cousin set is conformant, i.e, share the same sign for all i G Then, we can 
proceed the proof in the same fashion as the part following Equation (l52l) in part (b) of the proof of 
Theorem |43] 


Case 2: If cousin set is not conformant, from the assumption of part (a) of Theorem 14.61 and 
k* = 1, we should have \Ig-^ \ = 1. So, S 2 = si + 2. Erom Case 2 of part (b) of Theorem 14.51 we have 
Apr/dnew(pf* , , nf. , mf. ), pf A</dnew (pf*, 0 ?., ), pr (A^r) V4ew , 0 ?,, ), 

0f,,v^,,m‘^*) —)• 0 for all Si < i < S 2 — 1- Combining these results with the 


Pr«)Vrfnew(pr*, 

assumption that 


0, we obtain 


S2-1 


(pr*, 6»r*, vf ., mr*) ^ 0 . 




Since 


x; p”(Amr)^/finew {Pi*, 0i* 
j=si 


, mr*) 74 0 , we get 


S2 —1 S 2 — I 

Zn := p]Am]/ ^ 0 . 

j = si j = Sl 

Without loss of generality, we assume |Amr^_,_]^| > [Am^J for infinitely many n, which to avoid 
notational cluttering we also assume it holds for all n. Denote Amr^/Amr^+i —)• a. Divide both the 
numerator and denominator of by Am^^ and let n —)• 00 , we obtain p°^ + p^^^iO = 0. Similarly, 
from (I 5 TI) . by dividing both the numerator and denominator of Vn/Un for (Am^J^, we obtain p°^ + 
Psi+iO^ = 0. Therefore, we achieve a system of equations 

P°i +P°i+ia = 0, 

+P°i+im°^+iO^ = 0 . 

This is actually equation (fT^ when k* = 1 and r = 2. Solving the first equation, we obtain 
® = —Psj/Ps^+i- However, by substituting this result to the second equation, we get Ps^m^j+i + 
Ps^+iW-si = 0. We have the following two small cases: 

Case 2.1: Assume we have p^g^rnPg^ + p°^^im°j 7 ^ 0, then it means the system of equation does not 
have any solution. Hence, in this case, the lower bound of V(pG,PGo) is still W^iG, Go)- 
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Case 2.2: Assume we have 


0 . We have two important steps: 


Step 1- Construction to show that V{pg,PGo) cannot be lower bounded by Wf as r < s = 3: 

We construct Gn such that both Zn and Un/Vn can go to 0. We choose Gn = X] P7^{e^,v^,m^) such 

2=1 

that {pf, 9^, v^) = (p°, 6^, v^) for all 1 < i < k^, for all i 0 {si, si + 1}. Choose Am” = 

Sl + 1 Si + l 

— and Am”^_,_]^ = 1 /n, then we can check that Y1 p'j^fn'j = p”?T^°(Am”)^ = 0 . 

j=si j=si 

Additionally, for any 1 < r < s = 3, lC{’(G„,Go) = + p^g^j^iY/rC ■ By means of Taylor 

expansion up to third order, we can check that sup \pGn{x) — PGo{x)\/Wl{Gn, Go) —)• 0 as n ^ oo 

for all X G M. With this choice of G„, we also have 

^(PGn,PGo)/B^i (G„,Go) < f \pG„ix)-PGo{^)\dx/W[{Gn,Go) ^0, 

(- 5 , 5 ) 


where 5 is sufficiently large constant. Therefore, we achieve that for any 1 < r < 3 


lim inf 

e^OGeOfc(0xr2) 


V{PG,PGo) 

W{{G,Go) 


Wi{G,Go)<e 


= 0 . 


Step 2 - We show that V{pg,PGo) Y (G, Gq): In fact, it is sufficient to demonstrate that 


lim inf 

GeOk(Bxn) 


y{PG,PGo) 

W|(G,Go) 


IC3(G,Go)< 


> 0. 


Now, by assuming the contrary and carrying out the same argument as the proof of part (b) of Theorem 
14.51 with Taylor expansion go up to third order, we can see that Case 1 and Case 3 of part (b) still 
applicable to the third order, i.e yield the contradiction, because they are not affected by the non- 
conformant conditions. Now, Case 2 will yield us the following results 


j = si j = Sl 

S2 — 1 S2 —1 

^ p^\Am 

j = Sl j = Sl 

j=Si j=Si 

Y p]im^YiAmjf/ Y Pl\AmY ^ 0 . 


n 1 3 


n 1 3 


n |3 


0 , 

0 , 

0 , 


J=si 


J=S1 


Remind that S 2 = si + 2 and Am”^/Am”^^^ —)• a. By dividing both the numetor and denominator 
of first above result by Am”^, second above result by (Am”J^, and third and fourth above result by 
(Am”J^, we obtain the following system of equations 

P^si +P°i + l“ = 0: 

= 0 , 

+P°i+ia^ = 0, 

+P°i+i("i°i+i)^a^ = 0. 
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As (Ps^,mg^) / (Psi+i> ~^s°+i)’ the above system of equations does not admit any solution, which 
is a contradiction. Therefore, our assertion follows immediately. 

General argument for k*: Now, for general case of k*, we argue exactly the same way as that of 
Step 2 of Case 2.2. More specifically, by carrying out Taylor expansion up to s-th order and using the 
same argument as the proof of part (b) of Theorem 14.51 Case 1 and Case 3 under W-§ still yield the 
contradiction. As a consequence, we only need to deal with Case 2. In fact, it leads to the following 
results: 

S2-1 S2 + 1 

p]{m^^nAm]r/ Y, ^ 0 , ( 102 ) 

j=si j = Sl 

for any v < s, u < v are all odd numbers when v is even or 0 < u < n are all even numbers when 
V is odd. Notice that, now S 2 — si < k* + 1. Without loss of generality, we assume |Am”^| = 
max |Am^|. Denote Amf/Arrig —)• xi for all si + 1 < f < S 2 — 1- and = 1. Then by 

Sl<j<S2 — l ^ 

^2 + 1 S2 + 1 _ 

dividing both the numerator and denominator of ^ ^ p”|Am”|^ by {Am’^^Y 

j = si j=si 

and let re —oo, we achieve the following system of polynomial equations 

j=si 

for all 1 < n < s, re < u are all odd numbers when v is even or 0 < re < re are all even numbers 
when re is odd. Since S 2 — si < k* + 1, the hardest case will be when S 2 — si = k* + 1. In this case, 
the above system of equations becomes system (fT 2 ]) . From the hypothesis, we have already known that 
with that value of s, the above system of equations does not have any highly non-trivial solution, which 
is a contradiction. Therefore, the assertion of our theorem follows immediately. 

(b) Without loss of generality, we assume that {p^,mi) = {p 2 , —m^)- Now, we proceed to choose 

sequence Gn as that of Step 1 of Case 2.2 where si is replaced by 1. Then we can check that 
2 

p^{Am^Y = 0 for all odd number re < re when re is even number or for all even num- 
i=i 

her 0 < re < re when re is odd number. Therefore, for any r > 1, by carrying out Taylor expansion 
up to [r] + 1 -th order, we can check that sup — PGo(^)l/hC{’(Gn, Go) — 0 , thereby leading 

to l^(pG„)PGo)/hCf (G,i, Go) —>^0. As a consequence, we obtain the conclusion of part (b) of our 
theorem. 

Remark: As we can see from the case k* = 1, VFf is a lower bound of V{pg,PGo) under the 
condition (S.3), but it is not the best lower bound. More specifically, under the scenario of Case 2.1, 
W 2 is the best lower bound of V{pg,PGo)Y^^o h{pG,PGo)) while under the scenario of Case 2.2, IT 3 
is the best lower bound of V{pG,PGo)i ulso h{pG,PGo))- H suggests the minimax optimal convergence 
rate re~^/^ under W 2 distance in Case 2.1 or re“^/® under IT 3 distance in Case 2.2. As k* is bigger, 
such as k* = 2 , the minimax optimal convergence rate can be re“^/® under VF 4 or under W 5 or 

so on. These rates just reflect how broad convergence rate behaviors of skew-Gaussian are. 

Supplementary arguments for the proof of Theorem 14.51 Here, we give additional arguments and 
detailed calculations for the proof of Theorem l4.51 which are presented in Appendix I. 
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Detailed formulae of An^ 2 {x): 


d{Gn,Go)a^j, = 


diGn,Go)aZi = 


2Ap] p^Av^ p^{Ae^f 3p^{Av^)^ 
2p^A9^ Gp-^Ae^Av^ 


d{Gn,Go)a^ji = 
d{Gn,Go)a 2 ji = 


diGn,Go)aZi = 


diGn,Go)/3^, = 


p]Av^ ^ p]{Ae^f 
2 p^Ae^Av] 

i-^y ’ 

p'^iAv^f 

4(cx°)9 ’ 


3p^{Av]f 

~W^' 


Si+l —1 


p^m^^AO^ 2p'^mf]Ae'IAv'^ 2p'^A0'^Am^ 


J J 3 
7r(cj^)2 


3 3 3 3 

7r(cr?)"^ 


7r(cJ^)2 


+ 2 m*-)(A 6 <”)^ p^Am^ 

A( \ J J J J J J J I J J 

d[Gn,Go)l32i - - 2^(^0)4 


diGn,Go)/3^, = 


diGn,Go)/3i, = 


5p"mj(Atip^ p^Am^Av^ 

8vr(a0)6 vr(a0)4 ’ 

p^^( 2 (m 0)2 + 2 )Am'}Ae^ p^((m^f + 2 m^j)Ae:^Av^ 

^ 7r(cj0)4 27r(cj0)6 


+ 2 m°)(Ai;”)^ p”m°(Am”)^ 

^ 87r(cj°)8 27r(cj°)4 


p'jii'nijf + l)Am”At;^ 
7r((Tp® 


Detailed formulae of i(x): 


(i(G,,Go)7ri = - 


n'^Av^ 

^3 3 


_ p]{^e^y sp^jAvf, _ 

2y/^{aj)^ 2\/^{aj)^ 8\/^(cjj)® vr((Tp2 

p^AO^ p^AmA 3p”A0^A< p^Av^Am^ 

_J_ J I J __ J _J_ J _J_ 

\/^((T°)3 vr(cjp2 ^/^(f70)5 7r(cr°)4 


2\/^(it?)3 2 \/^(it?)^ 8 \/^(cj^)® 


4G'n,Go)72,- = 


d(G„,Go)73" = 


d{Gn,Go)l 2 . = 


d{Gn,Go)lt, = 


2 p^A 6 ^Am^ Ap^ 

J _J_^ _|_ _ 

/ n\ >-) “r r -— 


\f 2 M(jX 


p'jAv^ 

27 &^ 


p"(A 0”)2 

2 V^{a^) 


6 p]{Av^f 

sv^icr^y 


2p^A9^Am'j 

7r(cr^)"^ 


p^A9^Av^ p^Av^AwA 

J J J I J J J 

n :—/ rjNrr A~ / n\^? 


V^ia'^y 

p'j{Avy‘^ 

SV^iay' 


7r(cj^)6 


Additional arguments for Step 1.1: We divide this step into three further eases: 
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Case 1.1.1: If = 0?* for infinitely n, which without loss of generality, we can assume 9^, = 0?* 
for all n, then as Cg , Cg — ^ 0 as n — )■ oo, we achieve p”* , m”*) — )■ 0 as n —)■ oo. 

Combining this result with Cf, we get / d{p^t , 0”*, ) —)■ 0 as n —)• oo. With these results, 

C 2 yields that pf* Am^* /d{pft ., 6 *^, njl, m”*) —)• 0 as n ^ 00 . As a consequence, by summing these 
terms up, we obtain 


1 < (|Ap-I + |A0”| + |Aull| + \Am^\)/d{p^,e^.,v^.,m^) ^ 0, 


which is a contradiction. 

Case 1.1.2: If ujl = u?* for infinitely n, then we also can assume it holds for all n. From and Cg, 
we have p"*(A6*^)^/(i(p”*,0^,r;^,m”*) —> 0. Therefore, p”*A0JlAm”», m”*) —)■ 0. 
Combining these results with Cf, we get Ap'^,/d{p '^,, , m”*) —)■ 0 as n —)■ 00 . Additionally, by 

taking square of Cf, we obtain (Am”* , 0”*, uf*, m^.) —^ 0 as n —)■ 00 . These results imply 

that (i(p”*, , ujl, m”*)/(i(p”*, , m”*) —)■ 0 as n —)■ 00 , which is a contradiction. 


Case 1.1.3: If m”* = m°* for infinitely n, then we can assume that it holds for all n. Combining 
C 2 ” and Q, we obtain p” A0”* /(i(p”*, 0”*, (u”* )2, m^. ^ 


0 as n —)■ 00 . Combining this result with 


Cg and C”, we achieve p”* Au^ /d{pf*, 
n —)• 00 . This leads to a contradiction as well. 


0 and Ap”* /(i(p”*, f,m^ 


0 as 
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